Why do coronavirus cases graphs have a sinusoidal like shape?

Why do coronavirus cases graphs have a sinusoidal like shape?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Some screenshots from a canadian website:

See this sinusoidal shape? Why is it there? Shouldn't it be a single curve?

My main guess is when cases are counted: is it possible that they are counted something like every monday or so? I can't believe the virus progression really has sinusoidal-shaped curve.

It is not an error in the website, since other sources seem to show the same thing: worldometers, nbc news, and many others

It's a weekly cycle due to reporting disruptions over the weekend.

Use a 7-day moving average to get a better picture. Holidays and such can still be disruptive but the 7-day average solves most of it.

How a Sharp-Eyed Scientist Became Biology’s Image Detective

Using just her eyes and memory, Elisabeth Bik has single-handedly identified thousands of studies containing potentially doctored scientific images.

In June of 2013, Elisabeth Bik, a microbiologist, grew curious about the subject of plagiarism. She had read that scientific dishonesty was a growing problem, and she idly wondered if her work might have been stolen by others. One day, she pasted a sentence from one of her scientific papers into the Google Scholar search engine. She found that several of her sentences had been copied, without permission, in an obscure online book. She pasted a few more sentences from the same book chapter into the search box, and discovered that some of them had been purloined from other scientists’ writings.

Bik has a methodical, thorough disposition, and she analyzed the chapter over the weekend. She found that it contained text plagiarized from eighteen uncredited sources, which she categorized using color-coded highlighting. Searching out plagiarism became a kind of hobby for Bik she began trawling Google Scholar for more cases in her off-hours, when she wasn’t working as a researcher at Stanford. She soon identified thirty faked biomedical papers, some in well-respected journals. She e-mailed the publications’ editors, and, within a few months, some of the articles were retracted.

In January, 2014, Bik was scrolling through a suspicious dissertation when she began glancing at the images, too. They included photographs known as Western blots, in which proteins appear as dark bands. Bik thought that she’d seen one particular protein band before—it had a fat little black dot at one end. Elsewhere in the dissertation, she found the same band flipped around and presented as if it were data from a different experiment. She kept looking, and spotted a dozen more Western blots that looked copied or subtly doctored. She learned that the thesis, written by a graduate student at Case Western Reserve University, had been published as two journal articles in 2010.

The presence of a flawed image in a scientific study doesn’t necessarily invalidate its central observations. But it can be a sign that something is amiss. In science, images are profoundly important: every picture and graph in a scientific paper is meant to represent data supporting the authors’ findings. Photographic images, in particular, aren’t illustrations but the evidence itself. It seemed to Bik that duplicated or doctored images could be more damaging to science than plagiarism.

Bik decided to scan through some newly published studies in PLOS One, an “open access” journal in which articles are made available to the public free of charge. (The journal’s nonprofit publisher charges authors article-processing fees.) She opened fifteen articles, each in its own browser tab, and began eyeballing the images without reading the text. In a few hours, she’d looked at around a hundred studies and spotted a few duplicate images. “It very quickly became addictive,” Bik told me, in a marked Dutch accent. Night after night, she collected problematic articles, some with duplicate Western blots, others with copied images of cells or tissues. All had passed through peer review before being accepted. A few duplications could have been innocent—perhaps a mixup by a scientist with a folder full of files. But other images had been cloned, stretched, zoomed, rotated, or reversed. The forms and patterns in biology are endlessly unique Bik knew that these duplications couldn’t have happened by accident. Yet she didn’t want to mistakenly implicate a fellow-scientist in wrongdoing. She sent polite e-mails to the journals that had published the two Case Western studies. Editors eventually replied, promising to look into her concerns. Then six months passed with no further word. Bik was stymied.

In 2012, three scientists had created a Web site called PubPeer, where researchers could discuss one another’s published work. Critics objected to the fact that the site allowed anonymous comments. Still, PubPeer was moderated to prohibit unsubstantiated accusations, and, in several cases, unnamed whistle-blowers had used it to bring attention to image manipulations or statistical errors, spurring major corrections and retractions. It seemed to Bik that posting her findings online involved crossing a boundary: the traditional way to raise questions about a paper’s integrity was private communication with the authors, journals, or universities. She made an anonymous account anyway. “I have concerns about some figures in this paper,” she wrote, for each Case Western study. She uploaded screenshots of the image duplications, with the key areas clearly delineated by blue or red boxes, and clicked the button to submit.

Scientific publishing is a multibillion-dollar industry. In biomedicine alone, more than 1.3 million papers are published each year in all of science, there are more than twelve thousand reputable journals. Thousands of other Web-based journals publish even the flimsiest manuscripts after sham peer review, in exchange for processing fees. In China, researchers under pressure to meet unrealistic publication quotas purchase ghostwritten papers on a black market. Meanwhile, as the Web has made it easy for journals to proliferate, professional advancement in science has increasingly depended on publishing as many studies as possible.

Around a decade ago, scientists began reckoning with the effects of this supercharged publish-or-perish system. A few cases of outright fraud—including the British study that falsely linked vaccines to autism—troubled specific scientific disciplines in psychology, cancer research, and other fields, it was recognized that a meaningful proportion of studies had made overreaching claims and couldn’t be replicated. Reforms were introduced. Watchdog Web sites such as PubPeer and Retraction Watch sprang up, and a number of independent research-integrity detectives began unearthing cases of misconduct and sharing them through blogs, PubPeer, and on Twitter.

In March of 2019, when she was fifty-three, Bik decided to leave her job to do this detective work full time, launching a blog called Science Integrity Digest. Over the past six and a half years—while earning a bit of income from consulting and speaking, and receiving some crowdfunding—she has identified more than forty-nine hundred articles containing suspect image duplications, documenting them in a master spreadsheet. On Twitter, more than a hundred thousand people now follow her exposés.

Bik grew up with two siblings in Gouda, in the Netherlands, where her mother and physician father ran a medical practice out of their red-brick house, on a tree-lined canal. At the age of eight, Bik wanted to become an ornithologist, and spent hours with binoculars, scanning the garden for birds and recording all the species she sighted. She discovered science, earned a Ph.D. in microbiology, and moved to the United States just after 9/11, when her husband, Gerard, an optical engineer, got a job in Silicon Valley. She spent fifteen years studying the microbiome in a Stanford laboratory before moving on to the biotech industry.

When Bik first stumbled upon the image-duplication issue, a few journal editors had been writing about it, but no one had ascertained the scale of the problem. She e-mailed two prominent microbiologists, Ferric Fang and Arturo Casadevall, who had studied retractions in science publishing, introducing herself along with image duplications she’d found in Infection and Immunity and mBio—journals for which Fang and Casadevall were the editors-in-chief, respectively. The three agreed to a systematic study. Bik would screen papers in forty different journals, and Fang and Casadevall would review her findings.

In 2016, the team published their results in mBio. When journal editors examine questionable images, they typically use Photoshop tools that magnify, invert, stretch, or overlay pictures, but Bik does the same work mostly with her eyes and memory alone. Working at a speed of a few minutes per article, she had screened a jaw-dropping 20,621 studies. The team concluded that she was right ninety per cent of the time the remaining ten per cent of images included some that were too low-resolution to allow for a clear determination. They reported “inappropriate” image duplications in seven hundred and eighty-two, or four per cent, of the papers around a third of the flagged images involved simple copies, which could have been inadvertent errors, but at least half of the cases were sophisticated duplications which had likely been doctored. “Sometimes it seems almost like magic that the brain can do this,” Fang told me, of Bik’s abilities.

The trio estimated that, of the millions of published biomedical studies, tens of thousands ought to be retracted for unreliable or faked images. But adjusting the scientific record can be maddeningly slow, especially when research is lower-profile. In total, it took journal editors more than thirty months to retract the two Case Western papers that Bik had reported. In addition to contacting editors, Bik sometimes reaches out to research institutions, or to the Office of Research Integrity (O.R.I.), a government agency responsible for investigating misconduct in federally funded science. But the O.R.I. and institutions have protocols—they must obtain lab notebooks, conduct interviews, and so on—which take time to unfold.

By 2016, Bik had reported all seven hundred and eighty-two papers in the mBio study to journal editors (including at PLOS One). As of this June, two hundred and twenty-five had been corrected, twelve had been tagged with “expressions of concern,” and eighty-nine had been retracted. (Among them were five discredited studies by a cancer researcher at Pfizer, who was fired.) As far as Bik knows, fifty-eight per cent of the studies remain at large. In the past five years, she has reported problematic images in another 4,132 studies only around fifteen per cent have been addressed so far. (Three hundred and eighty-two have been retracted.) In only five or ten cases has she been told that authors proved her image concerns to be unfounded, she said.

Frustrated by these long timetables, Bik has transitioned to sharing more of her findings online, where journal readers can encounter them. On PubPeer, where she is the most prolific poster who uses her real name, her comments are circumspect—she writes that images are “remarkably similar” or “more similar than expected.” On Twitter, she is more performative, and often plays to a live audience. “#ImageForensics Middle of the Night edition. Level: easy to advanced,” Bik tweeted, at 2:41 A.M. one night. She posted an array of colorful photographs that resembled abstract paintings, including a striated vista of pink and white brushstrokes (a slice of heart tissue) and a fine-grained splattering of ruby-red and white flecks (a slice of kidney). Six minutes later, a biologist in the U.K. responded: two kidney photos appeared identical, she wrote. A minute later, another user flagged the same pair, along with three lung images that looked like the same tissue sample, shifted slightly. Answers continued trickling in from others they drew Bik-style color-coded boxes around the cloned image parts. At 3:06 A.M., Bik awarded the second user an emoji trophy for the best reply.

In Silicon Valley, Bik and her husband live in an elegant mid-century-modern ranch house with a cheerful, orange front door and a low-angled pitched roof. In the neighborhood, the residence is one of many duplicate copies sporting varying color schemes. I visited Bik just before the pandemic began. Tall, with stylish blue tortoiseshell eyeglasses and shoulder-length chestnut hair, she wore a blouse with a recurring sky-blue-and-orange floral pattern and had a penetrating, blue-eyed gaze. While Bik made tea, her husband, clad in a red fleece jacket, toasted some frozen stroopwafel cookies, from Gouda.

Playing tour guide, Bik showed off the original features of their kitchen, including its white Formica countertop, flecked with gold and black spots. “It’s random!” she assured me—no duplications. The same could not be said of the textured gray porcelain floor tiles. When workers installed them, Bik explained, she’d asked them to rotate the pieces that were identical, so that repeats would be less noticeable. A few duplicate tiles had ended up side-by-side anyway. I couldn’t see the duplication until she traced an identical wavy ridge in each tile with both of her index fingers. “Sorry—I’m, like, weird,” she said, and laughed.

In her bedroom closet, Bik’s shirts hung in a color gradient progressing from blacks and browns to greens and blues. Not long ago, she helped arrange her sister-in-law’s enormous shoe collection by color on new storage racks when some friends complained about the messy boxes of nuts, screws, and nails that littered their garage, Bik sorted them into little drawers. “Nothing makes me more happy,” she told me. Since childhood, she has collected tortoise figurines and toys around two thousand of them are arranged in four glass cabinets next to a blond-wood dining table. She keeps a spreadsheet tracking her turtle menagerie: there are turtles made from cowrie seashells, brass turtles, Delft blue porcelain turtles, bobble-headed turtles, turtle-shaped wood boxes with lids, and “functional” turtles (key chains, pencil sharpeners). She showed me a small stuffed animal with an eye missing: Turtle No. 1. (She has stopped adding to her collection. “I don’t want it to overtake my house,” she said.)

That afternoon, Bik settled at her dining table, which serves as her desk. Floor-to-ceiling windows offered a tranquil view of backyard foliage. On her curved widescreen monitor, Bik checked her Twitter account—her bio featured a photo of a cactus garden “That’s me—prickly,” she said—and then pulled up her master spreadsheet of problematic papers, which she doesn’t share publicly. Each of its thousands of entries has more than twenty columns of details. She removed her glasses, set them next to a cup of chamomile tea, sat up straight, and began rapidly scanning papers from PLOS One with her face close to the monitor. Starting with the first study—about “leucine zipper transcription factor-like 1”—she peered at an array of Western-blot images. She took screenshots and scrutinized them in Preview, zooming in and adjusting the contrast and brightness. (Occasionally, she uses Forensically and ImageTwin, tools that do some semi-automated photo-forensics analysis.) She moved on to a study with pink and purple cross-sections of mouse-gut tissue, then stopped on a figure with a dozen photos of translucent clumps of cells. She chuckled. “It looks like a flying rabbit,” she said, pointing at one blob.

Bik found no problems. PLOS One has “cleaned up their act a lot,” she said. The journal’s publisher employs a team of three editors who handle matters of publication ethics, including Bik’s cases. Renee Hoch, one of the editors, told me that the process of investigation, which entails obtaining original, raw images from the authors, and, in some cases, requesting input from external reviewers, usually takes four to six months per case. Hoch said that of the first hundred and ninety or so of Bik’s cases that the team had resolved, forty-six per cent required corrections, around forty-three per cent were retracted, and another nine per cent received “expressions of concern.” In only two of the resolved papers was nothing amiss. “In the vast majority of cases, when she raises an issue and we look into it, we agree with her assessment,” Hoch said.

Could Bik be replaced with a computer? There are arguments for the idea that automated image-scanning could be both faster and more accurate, with fewer false positives and false negatives. Hany Farid, a computer scientist and photo-forensic expert at the University of California, Berkeley, agreed that scientific misconduct is a troubling issue, but was uneasy about individual image detectives using their own judgment to publicly identify suspect images. “One wants to tread fairly lightly” when professional reputations are on the line, he told me. Farid’s reservations spring partly from a general skepticism about the accuracy of the human eye. While our visual systems excel at many tasks, such as recognizing faces, they aren’t always good at other kinds of visual discrimination. Farid sometimes provides court testimony in cases involving doctored images his lab has designed algorithms for detecting faked photographs of everyday scenes, and they are eighty-to-ninety-five-per-cent accurate, with false positives in roughly one in a hundred cases. Judging by courtroom standards, he is unimpressed by Bik’s stats and would prefer a more rigorous assessment of her accuracy. “You can audit the algorithms,” Farid said. “You can’t audit her brain.” He would like to see similar systems designed and validated for identifying faked or altered scientific images.

A few commercial services currently offer specialized software for checking scientific images, but the programs aren’t designed for large-scale, automated use. Ideally, a program would extract images from a scientific paper, then rapidly check them against a huge database, detecting copies or manipulations. Last year, several major scientific publishers, including Elsevier, Springer Nature, and EMBO Press, convened a working group to flesh out how editors might use such systems to pre-screen manuscripts. Efforts are under way—some funded by the O.R.I.—to create powerful machine-learning algorithms to do the job. But it’s harder than one might think. Daniel Acuña, a computer scientist at Syracuse University, told me that such programs need to be trained on and tested against large data sets of published scientific images for which the “ground truth” is known: Doctored or not? A group in Berlin, funded by Elsevier, has been slowly building such a database, using images from retracted papers some algorithm developers have also turned to Bik, who has shared her set of flawed papers with them.

Bik told me that she would welcome effective automated image-scanning systems, because they could find far more cases than she ever could. Still, even if an automated platform could identify problematic images, they would have to be reviewed by people. A computer can’t recognize when research images have been duplicated for appropriate reasons, such as for reference purposes. And, if bad images are already in the published record, someone must hound journal editors or institutions until they take action. Around forty thousand papers have received comments on PubPeer, and, for the vast majority, “there’s absolutely no response,” Boris Barbour, a neuroscientist in Paris who is a volunteer organizer for PubPeer, told me. “Even when somebody is clearly guilty of a career of cheating, it’s quite hard to see any justice done,” he said. “The scales are clearly tilted in the other direction.” Some journals are actively complicit in generating spurious papers a former journal editor I spoke with described working at a highly profitable, low-tier publication that routinely accepted “unbelievably bad” manuscripts, which were riddled with plagiarism and blatantly faked images. Editors asked authors to supply alternative images, then published the studies after heavy editing. “I think what she’s showing is the tip of the iceberg,” the ex-editor said, of Bik.

Some university research-integrity officers point out, with chagrin, that whistle-blowing about research misconduct on social media can tip off the scientists involved, allowing them to destroy evidence ahead of an investigation. But Bik and other watchdogs find that posting to social media creates more pressure for journals and institutions to respond. Some observers worry that the airing of dirty laundry risks undermining public faith in science. Bik believes that most research is trustworthy, and regards her work as a necessary part of science’s self-correcting mechanism universities, she told me, may be loath to investigate faculty members who bring in grant money, and publishers may hesitate to retract bad articles, since every cited paper increases a journal’s citation ranking. (In recent years, some researchers have also sued journals over retractions.) She is appalled at how editors routinely accept weak excuses for image manipulation—it’s like “the dog ate my homework,” she said. Last year, she tweeted about a study in which she’d found more than ten problematic images the researchers supplied substitute images, and the paper received a correction. “Ugh,” she wrote. “It is like finding doping in the urine of an athlete who just won the race, and then accepting a clean urine sample 2 weeks later.”

Last year, Bik’s friend Jon Cousins, a software entrepreneur, made a computer game called Dupesy, inspired by her work. One night, after Thai takeout, we tried a beta version of the game at her computer. Bik’s husband went first, clicking a link titled “Cat Faces.”

A four-by-four panel of feline mugshots filled the screen. Some cats looked bug-eyed, others peeved. Instructions read, “Click the two unexpectedly similar images.” Gerard easily spotted the duplicates in the first few rounds, then hit a more challenging panel and sighed.

“I see it, I see it,” Bik sang quietly.

Finally, Gerard clicked the winning pair. He tried a few more Dupesy puzzle categories: a grid of rock-studded concrete walls, then “Coarse Fur,” “London Map,” and “Tokyo Buildings.”

When my turn came, I started with “Coffee Beans.” On one panel of dark-roasted beans, it took me thirty-one seconds to find the matching pair on the next, six seconds. A few panels later, I was stuck. My eyes felt crossed. A nearby clock ticked loudly.

“Should I say when I see it?” Bik asked. “Or is that annoying?”

“Just tell me when it’s annoying, because I don’t always know,” she said.

“Absolutely. You’re annoying,” he replied.

On her turn, Bik cruised swiftly through several rounds of “Coarse Fur,” then checked out other puzzle links. Some panels were “much harder than my normal work,” she said. The next day, Cousins e-mailed us with results: Bik’s median time for solving the puzzles was twelve seconds, versus about twenty seconds for her husband and me.

Llama Nanobodies: Small, Simple, Stable

The simplicity of a nanobody protein is predicted to provide greater stability, which is what would allow it to be delivered by an inhaler: a nanobody drug must survive being 'nebulized' — turning a liquid solution into an aerosol spray.

"Nebulizing an antibody, a fairly complex molecule, in general puts considerable stress on the protein," says Professor Xavier Saelens, a virologist at Ghent University in Belgium, whose team isolated the VHH-72 nanobody and is leading the hamster experiments. "There's reason to believe that a more stable nanobody would more easily withstand such stresses, but it needs to be proven."

A stable nanobody is easier to manufacture and can be stored and stockpiled, ready to be deployed if an outbreak strikes. The drug could be used as a prophylactic for those at higher risk of being infected, like healthcare workers, as nanobodies can circulate in the bloodstream for 1-2 months, offering short-term protection (people would need a booster later as the drug will slowly be broken-down by the body over time).

The nanobody could also be used as a therapeutic drug to treat those who don't realize that they are carrying and spreading COVID-19. "There's a period where people are asymptomatic, or fully have almost no symptoms," says Saelens. "That gives you a window of opportunity which may be convenient for some therapy."

Winter the llama (brown, center) and her photobombing friends on a farm in the Belgian countryside.

There are still questions to answer about how nanobodies work. Of the 675 antibody programmes being actively developed for the clinic last year, only 11 involved nanobodies. The FDA approved 'caplacizumab' to treat a rare blood clotting disorder and the biotech company which makes that drug, Ablynx, also developed 'ALX-0171', a nanobody for Respiratory Syncytial Virus. ALX-0171 was tested as an inhaler in phase 3 trials, which suggests it was effective, but the project is currently on hold. "There's no product on the market based on llamas or based on nanobodies that's being inhaled, as far as I know," says Saelens.

Nanobodies are not new but have only recently gained recognition. Saelens points out that camelid antibodies were discovered by a fellow Belgian scientist, Raymond Hamers at the Free University of Brussels, who found them by chance while studying the blood of a dromedary camel. "If that coincidental finding had been missed or overlooked, we wouldn't have access to these special antibodies today," he says. "Curiosity-driven science can lead to applications a long time after the discovery."

When Should I Use Logarithmic Scales in My Charts and Graphs?

There are two main reasons to use logarithmic scales in charts and graphs. The first is to respond to skewness towards large values i.e., cases in which one or a few points are much larger than the bulk of the data. The second is to show percent change or multiplicative factors. First I will review what we mean by logarithms. Then I will provide more detail about each of these reasons and give examples.

To refresh your memory of school math, logs are just another way of writing exponential equations, one that allows you to separate the exponent on one side of the equation. The equation 2 4 = 16 can be rewritten as log 2 16 = 4 and pronounced "log to the base 2 of 16 is 4." It is helpful to remember that the log is the exponent, in this case, "4". The equation y = log b (x) means that y is the power or exponent that b is raised to in order to get x. The common base for logarithmic scales is the base 10. However, other bases are also useful. While a base of ten is useful when the data range over several orders of magnitude, a base of two is useful when the data have a smaller range.

Figure 1. Dot plot of revenues of the top 60 Fortune 500 companies. Data Source: . [+]

Figure 1 uses a dot plot to show the revenues of the top 60 companies on the 2011 Fortune 500 list which provides revenues for 2010. One reason for choosing a dot plot rather than a bar chart is that it is less cluttered. We will be learning other benefits of dot plots in this and future posts.

Wal-mart Stores and Exxon-Mobil have much larger revenues than the other companies. As a result, the differences in the revenues of the other companies are compressed, making these differences more difficult to judge.

Figure 2. Dot plot of revenues of top 60 Fortune 500 companies on a log scale with base 2.

The same data are plotted in Figure 2 on a logarithmic scale with base 2. My reason for using base 2 was to avoid the tick marks with decimal exponents that base 10 would have produced. The data range from about 40 to about 400. That’s not too many orders of magnitude. Figure 3 plots the data with logs to the base 10 with tick labels in powers of ten. If we want more than one or two tick marks we get the decimal exponents shown in Figure 3. Using the base 2 avoids this problem. Next week we will discuss alternative ways of labeling log scales.

Figure3. Dot plot of data of Figure 2 shown on a log scale with base of 10

A dot plot is judged by its position along an axis in this case, the horizontal or x axis. A bar chart is judged by the length of the bar. I don’t like using lengths with logarithmic scales. That is a second reason that I prefer dot plots over bar charts for these data.

In Figure 2, the value of each tick mark is double the value of the preceding one. The top axis emphasizes the fact the data are logs. The bottom axis shows the values in the original scale. This labeling follows the advice of William Cleveland with the top and bottom axes interchanged. The data values are spread out better with the logarithmic scale. This is what I mean by responding to skewness of large values. The revenue for Boeing is about 2 6 billion dollars while the revenue for Ford Motor is about 2 7 . In Figure 1, the linear scale, the revenue for Ford is the revenue for Boeing plus the difference between these two revenues. We call this additive. Since 2 6 = 64 and 2 7 = 128, we see that the difference is about 64 billion dollars. In Figure 2 the difference is multiplicative. Since 2 7 = 2 6 times 2, we see that the revenues for Ford Motor are about double those for Boeing. This is what I mean by saying that we use logarithmic scales to show multiplicative factors.

The previous example showed both responding to large values and multiplicative factors. The next example just describes rates of change. Suppose we had one widget in 1999 and doubled the number each year. The following charts show the number of widgets on a linear and logarithmic scale:

Figure 4. A comparison of linear and logarithmic (log) scales

The linear scale shows the absolute number of widgets over time while the logarithmic scale shows the rate of change of the number of widgets over time. The bottom chart of Figure 4 makes it much clearer that the rate of change or growth rate is constant.

Dr. Nicolas Bissantz in his blog, Me, Myself, and BI, would call the linear chart a panic chart. He says that “line charts are speed charts.” That is, they show the rate of change or slope of the number of widgets. A chart with a linear scale similar to the top chart of Figure 4 showing a quantity such as our national debt causes panic even if the rate of change is constant.

Logarithmic scales are extremely useful but are not understood by all. As in all presentations, designers must know their audiences.

†Joint UNIversities Pandemic and Epidemiological Research. See

Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.


. 2006 Statistical mechanics of community detection . Phys. Rev. E 74, 016110. (doi:10.1103/PhysRevE.74.016110) Crossref, PubMed, ISI, Google Scholar

2006 The igraph software package for complex network research . InterJournal Complex Syst. 1695, 1-9. Google Scholar

. 2001 Epidemiology. Predicting the unpredictable . Science 294, 1663-1664. (doi:10.1126/science.1067669) Crossref, PubMed, ISI, Google Scholar

. 1992 Infectious diseases of humans: dynamics and control . Oxford, UK : Oxford University Press . Google Scholar

. 2011 Modeling infectious diseases in humans and animals . Princeton, NJ : Princeton University Press . Crossref, Google Scholar

. 2021 Challenges in control of COVID-19: short doubling time and long delay to effect of interventions . Phil. Trans. R. Soc. B 376, 20200264. (doi:10.1098/rstb.2020.0264) Link, Google Scholar

Read JM, Bridgen JRE, Cummings DAT, Ho A, Jewell CP

. 2021 Novel coronavirus 2019-nCoV (COVID-19): early estimation of epidemiological parameters and epidemic size estimates . Phil. Trans. R. Soc. B 376, 20200265. (doi:10.1098/rstb.2020.0265) Link, Google Scholar

Fyles M, Fearon E, Overton C, University of Manchester COVID-19 Modelling Group, Wingfield T, Medley GF, Hall I, Pellis L, House T

. 2021 Using a household-structured branching process to analyse contact tracing in the SARS-CoV-2 pandemic . Phil. Trans. R. Soc. B 376, 20200267. (doi:10.1098/rstb.2020.0267) Link, Google Scholar

Danon L, Lacasa L, Brooks-Pollock E

. 2021 Household bubbles and COVID-19 transmission: insights from percolation theory . Phil. Trans. R. Soc. B 376, 20200284. (doi:10.1098/rstb.2020.0284) Link, Google Scholar

Brooks-Pollock E, Danon L, Jombart T, Pellis L

. 2021 Modelling that shaped the early COVID-19 pandemic response in the UK . Phil. Trans. R. Soc. B 376, 20210001. (doi:10.1098/rstb.2021.0001) Link, Google Scholar

Challen R, Tsaneva-Atanasova K, Pitt M, Edwards T, Gompels L, Lacasa L, Brooks-Pollock E, Danon L

. 2021 Estimates of regional infectivity of COVID-19 in the United Kingdom following imposition of social distancing measures . Phil. Trans. R. Soc. B 376, 20200280. (doi:10.1098/rstb.2020.0280) Link, Google Scholar

Sherratt K, Abbott S, Meakin SR, Hellewell J, Munday JD, Bosse N, CMMID COVID-19 Working Group, Jit M, Funk S

. 2021 Exploring surveillance data biases when estimating the reproduction number: with insights into subpopulation transmission of COVID-19 in England . Phil. Trans. R. Soc. B 376, 20200283. (doi:10.1098/rstb.2020.0283) Link, Google Scholar

2020 Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK . BMC Med. 18, 124. (doi:10.1186/s12916-020-01597-8) Crossref, PubMed, ISI, Google Scholar

. 2021 REACT-1 round 9 final report: continued but slowing decline of prevalence of SARS-CoV-2 during national lockdown in England in February 2021. MedRxiv. Google Scholar

. 2021 Community prevalence of SARS-CoV-2 in England from April to November 2020: results from the ONS Coronavirus Infection Survey . Lancet Pub. Health 6, E30–E38. (doi:10.1016/s2468-2667(20)30282-6) Google Scholar

. 2015 What makes an academic paper useful for health policy? BMC Med. 13, 301. (doi:10.1186/s12916-015-0544-8) Crossref, PubMed, ISI, Google Scholar

. 1989 Can bad models suggest good policies? Sexual mixing and the AIDS epidemic . J. Sex Res. 26, 301-314. (doi:10.1080/00224498909551517) Crossref, ISI, Google Scholar

Anderson RM, Medley GF, May RM, Johnson AM

. 1986 A preliminary study of the transmission dynamics of the human immunodeficiency virus (HIV), the causative agent of AIDS . IMA J. Math. Appl. Med. Biol. 3, 229-263. (doi:10.1093/imammb/3.4.229) Crossref, PubMed, Google Scholar

Anderson RM, Medley GF, Blythe SP, Johnson AM

. 1987 Is it possible to predict the minimum size of the acquired immunodeficiency syndrome (AIDS) epidemic in the United Kingdom? Lancet 1, 1073-1075. (doi:10.1016/S0140-6736(87)90493-4) Crossref, PubMed, ISI, Google Scholar

Ghani AC, Ferguson NM, Donnelly CA, Anderson RM

. 2000 Predicted vCJD mortality in Great Britain . Nature 406, 583-584. (doi:10.1038/35020688) Crossref, PubMed, ISI, Google Scholar

. 2001 Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape . Science 294, 813-817. (doi:10.1126/science.1065973) Crossref, PubMed, ISI, Google Scholar

Baguelin M, Flasche S, Camacho A, Demiris N, Miller E, Edmunds WJ

. 2013 Assessing optimal target populations for influenza vaccination programmes: an evidence synthesis and modelling study . PLoS Med. 10, e1001527. (doi:10.1371/journal.pmed.1001527) Crossref, PubMed, ISI, Google Scholar

. 2021 Epidemic interventions: insights from classic results . Phil. Trans. R. Soc. B 376, 20200263. (doi:10.1098/rstb.2020.0263) Link, Google Scholar

Kucharski AJ, Funk S, Eggo RM

. 2020 The COVID-19 response illustrates that traditional academic reward structures and metrics do not reflect crucial contributions to modern science . PLoS Biol. 18, e3000913. (doi:10.1371/journal.pbio.3000913) Crossref, PubMed, ISI, Google Scholar

. 2021 Why development of outbreak analytics tools should be valued, supported, and funded . Lancet Infect. Dis. 21, 458-459. (doi:10.1016/S1473-3099(20)30996-8) Crossref, PubMed, Google Scholar

. 2020 Critiqued coronavirus simulation gets thumbs up from code-checking efforts . Nature 582, 323-324. (doi:10.1038/d41586-020-01685-y) Crossref, PubMed, ISI, Google Scholar

Birrell P, Blake J, van Leeuwen E, Gent N, De Angelis D.

2021 Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave . Phil. Trans. R. Soc. B 376, 20200279. (doi:10.1098/rstb.2020.0279) Link, Google Scholar

et al. 2021 Real-time monitoring of COVID-19 dynamics using automated trend fitting and anomaly detection . Phil. Trans. R. Soc. B 376, 20200266. (doi:10.1098/rstb.2020.0266) Link, Google Scholar

Brooks-Pollock E, Read JM, McLean AR, Keeling MJ, Danon L

. 2021 Mapping social distancing measures to the reproduction number for COVID-19 . Phil. Trans. R. Soc. B 376, 20200276. (doi:10.1098/rstb.2020.0276) Link, Google Scholar

. 2021 The impact of school reopening on the spread of COVID-19 in England . Phil. Trans. R. Soc. B 376, 20200261. (doi:10.1098/rstb.2020.0261) Link, Google Scholar

Brooks-Pollock E, Read JM, House T, Medley GF, Keeling MJ, Danon L

. 2021 The population attributable fraction of cases due to gatherings and groups with relevance to COVID-19 mitigation strategies . Phil. Trans. R. Soc. B 376, 20200273. (doi:10.1098/rstb.2020.0273) Link, Google Scholar

Stage HB, Shingleton J, Ghosh S, Scarabel F, Pellis L, Finnie T

. 2021 Shut and re-open: the role of schools in the spread of COVID-19 in Europe . Phil. Trans. R. Soc. B 376, 20200277. (doi:10.1098/rstb.2020.0277) Link, Google Scholar

Danon L, Brooks-Pollock E, Bailey M, Keeling M

. 2021 A spatial model of COVID-19 transmission in England and Wales: early spread, peak timing and the impact of seasonality . Phil. Trans. R. Soc. B 376, 20200272. (doi:10.1098/rstb.2020.0272) Link, Google Scholar

. 2021 Engagement and adherence trade-offs for SARS-CoV-2 contact tracing . Phil. Trans. R. Soc. B 376, 20200270. (doi:10.1098/rstb.2020.0270) Link, Google Scholar

Evans S, Agnew E, Vynnycky E, Stimson J, Bhattacharya A, Rooney C, Warne B, Robotham J

. 2021 The impact of testing and infection prevention and control strategies on within-hospital transmission dynamics of COVID-19 in English hospitals . Phil. Trans. R. Soc. B 376, 20200268. (doi:10.1098/rstb.2020.0268) Link, Google Scholar

Hall I, Lewkowicz H, Webb L, House T, Pellis L, Sedgwick J, Gent Non behalf of the University of Manchester COVID-19 Modelling Group and the Public Health England Modelling Team

. 2021 Outbreaks in care homes may lead to substantial disease burden if not mitigated . Phil. Trans. R. Soc. B 376, 20200269. (doi:10.1098/rstb.2020.0269) Link, Google Scholar

. 2021 Segmentation and shielding of the most vulnerable members of the population as elements of an exit strategy from COVID-19 lockdown . Phil. Trans. R. Soc. B 376, 20200275. (doi:10.1098/rstb.2020.0275) Link, Google Scholar

. 2021 Dynamics of SARS-CoV-2 with waning immunity in the UK population . Phil. Trans. R. Soc. B 376, 20200274. (doi:10.1098/rstb.2020.0274) Link, Google Scholar

COVID-19, lies and statistics: Corruption and the pandemic

Children at the Public Health Initiative in Karnataka, India. Researchers in India found that COVID-19 infections in the country, had been grossly underestimated and could be up to 95 times higher than the official numbers. Copyright: Trinity Care Foundation, (CC BY-NC-ND 2.0).

From Brazil to the Philippines, secretive governments across the world are responding to the COVID-19 pandemic by covering up data and bypassing public procurement rules, undermining trust in health systems, fuelling anti-vaxxers and putting immunisation campaigns at risk.

Clandestine contracts for medical goods and services have become the norm in many countries, while data on COVID-19 cases and deaths has been manipulated and underreported.

Authorities and heads of states have used the pandemic as an opportunity to gut public bodies dedicated to openness and communication, with the worst offenders forming a rogues' gallery of coronavirus offenders.

In the global South, the repercussions for already struggling health and governance systems could be catastrophic.

This is a tale of two pandemics, says Jonathan Cushing, who leads on global health at the anti-corruption non-profit Transparency International.

"You have COVID-19 and then what we've seen over the past year is this lack of transparency—the utilization of direct procurement legislation because of the emergency needs at the time," Cushing tells SciDev.Net.

"We've seen repeated cases of corruption, and that is the second pandemic in many ways."

Malpractice, says Cushing, has been reported "around the world, in the Philippines, in Uganda, we've seen cases raised in Kenya, Latin America as well".

"As the pandemic has progressed, we've seen the shift from the rush to buy [personal protective equipment] and ventilators … to the procurement of vaccines," says Cushing.

"What we're seeing now is a complete lack of transparency."

Tanzanian President John Magufuli may have been a victim of his own refusal to acknowledge the presence and seriousness of SARS-CoV-2, the virus that causes COVID-19.

In June, Magufuli declared that "the corona disease has been eliminated by God", leaving Tanzania "coronavirus-free". The government stopped publishing data on case numbers, while disease surveillance and advocacy wound down.

Magufuli's death—officially from heart problems, but widely believed to be connected with COVID-19—was announced on 17 March.

Residents of Tanzania's largest city, Dar es Salaam, told SciDev.Net it remained to be seen whether Tanzania's government would reverse Magufuli's increasingly authoritarian policies. But new president Samia Suluhu Hassan has already marked a departure from her predecessor, pledging to form a scientific advisory committee on COVID-19.

Epidemiologists say statistics will be central to any new evidence-driven response. For now, the data deficit remains as people continue to fall ill.

"What's lacking now in Tanzania is an enabling environment that allows scientific enquiries on such things as pandemics," says Frank Minja, a Tanzanian doctor and associate professor of neuroradiology at the Yale School of Medicine's radiology and biomedical imaging department, in the US.

"Of course, science doesn't work in isolation, it needs economic and political decisions. But science is a tool that should be used to solve our problems."

For the countries that are publishing statistics, many of these have been massaged to reveal a rosier version of reality.

Data manipulation is a key marker of COVID-19 corruption, according to Transparency International. This can have devastating consequences, such as resources misallocation, spikes in case rates as citizens are encouraged to carry on as normal, and increased mistrust in governments when reality does not match with the official version of events.

Multiple studies, including those based on rates of population with COVID-19 antibodies, have suggested that SARS-CoV-2 is more prevalent in many countries than official statistics reveal.

One such study, carried out in Zambia and published in the BMJ, observed "a surprisingly high prevalence of COVID-19 mortality".

The research team, led by Lawrence Mwananyanda from Right to Care, said: "Contradicting the prevailing narrative that COVID-19 has spared Africa, COVID-19 has had a severe impact in Zambia. How this was missed is largely explained by low testing rates, not by a low prevalence of COVID-19.

"If our data are generalisable to other settings in Africa, the answer to the question, 'Why did COVID-19 skip Africa?' is that it didn't."

In response, pathologists from the Ministry of Home Affairs and Zambia's University Teaching Hospital said the conclusions from the Lusaka study were "highly questionable" and could not be extrapolated to all of Sub-Saharan Africa.

Researchers in India found that COVID-19 infections had been grossly underestimated and could be up to 95 times higher than the official numbers. Health and development economist Anup Malani told SciDev.Net that the high seroprevalence—the number of people who tested positive for COVID-19—in the rural areas studied was due to mass migration from the cities to escape lockdown restrictions.

In Brazil, the country's health ministry removed cumulative COVID-19 data from its website in June as President Jair Bolsonaro declared that the statistics did not "reflect the moment the country is in".

The supreme court ordered that the data be restored, and the figures now indicate that Brazil is among the worst affected countries in the world, as the government appoints its fourth health minister since the pandemic began.

Attacking the messengers

Those who refuse to toe government lines have faced repercussions, from losing their jobs to legal intimidation and verbal attacks.

After she criticized Philippines President Rodrigo Duterte and his government over its handling of the COVID-19 crisis, a barrage of online abuse was directed at Maria Ressa, the founder of the Philippines-based news site Rappler.

Ressa is an outspoken Duterte critic, but analysis by the International Center for Journalists (ICFJ) shows that the second-highest spike in abuse of Ressa correlated to her questioning the official reporting of COVID-19 data.

Scientists argue that government data indicating low COVID-19 case numbers in Tanzania, Mexico and the Philippines misrepresents the reality of the health crisis. The graph reflects official data sourced by the Johns Hopkins University COVID-19 map, as of 26 March.

"Ressa lives at the core of a very 21st century storm," says the report, led by ICFJ global director of research Julie Posetti. "It is a furore of disinformation and attacks—one in which credible journalists are subjected to online violence with impunity where facts wither and democracies teeter."

Posetti says that state threats and digital violence put journalists—and journalism—at risk in the real world. In January, the tenth arrest warrant was issued against Ressa, who is facing multiple libel proceedings, on top of a six-year jail term, which Ressa is appealing.

Anthony Leachon, a leading physician and public health expert in the Philippines, was forced to quit as a special adviser on the COVID-19 taskforce in June after he publicly disagreed with the Duterte administration's handling of the pandemic, and argued that the taskforce had succumbed to political pressure.

Leachon also criticized the health department, saying its reporting of COVID-19 was unreliable and delayed, and questioned why the government was favoring vaccines from companies with no safety and efficacy data. The Philippines government has approved the Sinovac and Sinopharm vaccines for people aged under 65.

Leachon has continued to call on authorities to release real-time data on COVID-19 infections, which he said will allow local officials to quickly respond with appropriate policies.

For speaking independently, Leachon has been ridiculed and mocked by Duterte supporters. The administration's spokesman, Harry Roque, said that Duterte repeatedly cursed Leachon during a December cabinet meeting on COVID-19.

"This lack of transparency and urgency is the reason why the Philippines is in a mess right now and has fallen behind its peers in the region," Leachon tells SciDev.Net.

"We are in the longest continuous lockdown in the world. We are still in quarantine yet we are now seeing a surge in cases … [and] we are still in the middle of the debate on what vaccines to procure, how and what are the selection guidelines."

In March, the Philippines received its first delivery of vaccines from the COVAX facility—the international partnership established to support vaccine access for low- and middle-income countries. Leachon warns that the lack of openness and transparency in the country risks undermining confidence in COVID-19 vaccines.

Attacks on Mexico's independent freedom of information body—the National Institute for Transparency, Access to Information and Protection of Personal Data (INAI)—intensified with the onset of the pandemic.

In 2019, President Andrés Manuel López Obrador's first year in office, INAI reported that the administration was legally challenging at least 30 information requests that the institute had approved.

In January, López Obrador announced a proposal to shut down INAI and replace it with a government agency, a move that Human Rights Watch Americas director José Miguel Vivanco called "the perfect recipe for secrecy and abuse".

"The INAI has played a crucial role in protecting privacy and ensuring the public can access information about government corruption and human rights violations," Vivanco said.

Janet Oropeza Eng, an accountability and anti-corruption researcher at the civil society organization Fundar, says that INAI has been critical to efforts to monitor federal government agencies over the past 10 years.

"If INAI disappears or becomes dependent on the executive [branch of government], it would be a regression in terms of independence and autonomy. It is not possible to be both judge and party," Eng tells SciDev.Net.

"The executive could not order itself to open information. For Mexicans, it would be a step backwards in the right to information that was already guaranteed."

Transparency International's Mexico branch has made repeated freedom of information requests for pricing transparency, but "gets nowhere", according to Jonathan Cushing, from the organization's global health team.

"And it's not unique to Mexico, it's Argentina, in Pakistan they're trying the same thing. It's a complete shutdown," he says.

Cryptic references to 'pneumonia' became one of the only ways Tanzanians could talk openly about the disease outbreak, after the government passed a regulation prohibiting reporting on COVID-19.

Under the online content regulation, publishing "public information that may cause public havoc and disorder" is banned, including "content with information with regards to the outbreak of a deadly or contagious diseases [sic] in the country or elsewhere without the approval of the respective authorities".

Breaches could be met with imprisonment for a minimum of 12 months, a fine of at least five million shillings (US$2155), or both.

Adolf Mkono, a resident of Tanzania's northern Kagera region, which borders Burundi, Rwanda, Uganda and Lake Victoria, says the fear of being charged under the new regulation is stifling information-sharing.

"Since the end of last year, I have witnessed a number of people dying after complaining of difficulty in breathing. This isn't normal," says Mkono

"Maybe it is not COVID-19, but the government should come out and explain what's causing these deaths. We don't have the freedom or the facts to say if it is COVID-19," Mkono tells SciDev.Net in an interview conducted in January this year, before the decision by Tanzania's new President Samia Suluhu Hassan to form a scientific advisory committee on COVID-19. *

Tanzania's health ministry did not directly respond to these claims when SciDev.Net sought comment. But, the head of communications shared a video of the ministry's principal secretary, Mabula Mchembe, saying: "It is not true that every patient who complains of breathing difficulties should be believed to have COVID-19. I have visited many hospitals and what I can conclude is that the COVID-19 situation being talked about is based on social media claims that aren't true."

Procurement behind closed doors,

COVID-19 vaccine producers have required governments around the world to sign non-disclosure agreements to keep the price per dose a secret.

"Some countries have created special commissions to negotiate the purchase of COVID-19 vaccines," the United Nations Office on Drugs and Crime said in a January policy paper.

"There can be a lack of transparency, and thus a potential risk of corruption in what these agreements entail."

With emergency procurement procedures and vaccine negotiations hidden from public view, health system corruption can have a speedy—and devastating—knock-on effect for disease containment.

On 27 March, Mexico's government quietly revised its official COVID-19 statistics, acknowledging that the true death toll may be 60 percent higher than previously reported.

The red bar indicates the level of ‘highly explicit abuse’ directed at Maria Ressa after she criticised the Philippines’ handling of the COVID-19 pandemic, based on data compiled by the ICFJ

The country went from being praised by the Pan-American Health Organization for its initial response to the pandemic, to being ranked among the worst performing countries in the world as cases and deaths climbed.

In April 2020 the government issued an open-ended decree authorizing the purchase of coronavirus-related goods and services without the need to carry out the public bidding process.

The result, according to the data analysis project COVID Purchases (ComprasCOVID.MX), by human rights NGO PODER and data journalism initiative Serendipia, was that almost 95 percent of Mexico's procurement contracts in 2020 were direct award contracts—that is, awarded without competition.

This compares with 2019, when about 78 percent of public contracts were direct awards, according to civil society organization Mexicans Against Corruption and Impunity.

And concerns have been raised about the suitability of companies awarded tenders, with many lacking experience in the medical fields or linked to previous corruption cases.

"The worrying issue is that all levels of government—not only federal—are abusing the emergency decree to continue with direct awards without any restrictions," Eduardo Bohórquez, director of Transparency International's country branch Transparencia Mexicana, told SciDev.Net.

Data gathered by ComprasCOVID.MX shows that government institutions paid wildly different prices for vital personal protective equipment, such as face masks.

While some government departments paid one Mexican peso—about five US cents—per mask, others paid up to 405 pesos—almost US$20.

Bohórquez says that without a sunset clause in the emergency procurement legislation, Mexico's public funds and institutions are left vulnerable to corruption.

While there are reports of private vaccine holidays to the United Arab Emirates up for grabs, Pakistan is believed to be the first country to have approved the private import and commercial sale of COVID-19 vaccines.

Transparency advocates have urged the government to abandon the proposal, which would allow Sputnik-V vaccines to be sold at a price around 160 percent higher than the global set price of US$10 per dose.

This would "provide a window of corruption", as it could mean government vaccines end up in private hospitals for commercial sale and price gouging in the public sector, said Nasira Iqbal, former justice of the Lahore High Court and vice-chair of Transparency International Pakistan, in a letter to the Prime Minister.

"The government should not encourage such a policy of favoring a certain section of the society at the cost of transparency," Iqbal went on to say.

In response, the Ministry of National Health Services Regulations and Coordination secretary Aamir Ashraf Khawaja said private vaccine sales were just one element of the government's COVID-19 response.

"It was a well considered decision of the Federal government to allow private sector to import vaccine as the national vaccination priorities favored the healthcare workers and the elderly, involving some lag in reaching other segments of the society," Khawaja said in a letter.

"It may be added that the government is fixing the maximum retail price, leaving room for competition and free market dynamics."

Potential conflicts of interest between authorities and private drug companies leave opportunity for corruption, say analysts.

SciDev.Net contacted the Pakistan government's Press Information Department for comment, but did not receive a response by the time of publication.

A Pakistan-based economist, who asked not to be named for fear of government or military reprisals, told SciDev.Net that that while there were few confirmed reports of coronavirus corruption in Pakistan, rumors abound in a country known for its systemic and endemic corruption.

"On the policy side, there are rumors of corruption in the purchase of ventilators and in suppressing case and death data in the province of Punjab early on to create an impression that the incidence of the virus was low," the economist said.

"On the health front, there has been a lot of skepticism on testing for COVID-19. It is again rumored that for travel purposes, private labs were on purpose providing negative results. This was mainly done for international travel. There was no monitoring, either by the press or by NGOs."

Additionally, analysts say that the US$1.386 billion pledged to Pakistan in April 2020 under the IMF's Rapid Financing Instrument was yet to be fully disbursed.

Open science, open societies

The first casualty of coronavirus corruption could well be public trust, says Cushing, as skewed statistics and bent procurement processes give vaccine skeptics ammunition that could hamper responses to the pandemic.

"In the current scenario in the pandemic, trust is key. We need to ensure that there is trust in the system," Cushing said.

"Part of transparency is about stopping the corruption, stopping those individuals making profit illegally and immorally from this.

"But it's also about letting people understand and participate in decision making. If you know what's going on and you can understand what decisions are being taken and can help shape that, it makes it much easier to hold governments accountable."

The consequences for societies could reach beyond the immediate disease threat. Without open data and open science, communities will be left out of important conversations and in the dark, says Cushing.

Health bodies require data to support "rational decision making" when it comes to prioritizing which drugs to buy. "If that's not there, then at best you could be wasting money, at worst it could be impacting on health outcomes," said Cushing.

"The connection between governance and healthy societies is exactly that—you want transparent governance systems and the more transparent they are and the more people can engage in them, there is greater trust in that system.

"That should allow that conversation to happen about where the country goes in terms of health, development, socioeconomic priorities."

We thank Christian Diem for helpful discussions, Georg Heiler and Tobias Reisch for the mobility data, and two anonymous referees for a large number of suggestions to improve the work. This work was supported in part by the Austrian Science Promotion Agency project under Grant FFG 857136, the Austrian Science Fund under Grant FWF P29252, the Wiener Wissenschafts und Technologiefonds under Grant COV 20-017, and the Medizinisch-Wissenschaftlichen Fonds des Bürgermeisters der Bundeshauptstadt Wien under Grant CoVid004.

Author contributions: S.T. designed research S.T. and P.K. conceived the work S.T. and P.K. performed research S.T., P.K., and R.H. contributed new reagents/analytic tools S.T., P.K., and R.H. analyzed data and S.T. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

Reckoning With Uncertainty

Models that rely on fixed assumptions are not the only ones that need to be navigated with care. Even complex epidemiological models with built-in mechanisms to account for changing conditions deal in uncertainties that must be handled and communicated cautiously.

As the epidemic emerged around her in Spain, Susanna Manrubia, a systems biologist at the Spanish National Center for Biotechnology in Madrid, became increasingly concerned about how the results of various models were being publicized. “Our government was claiming, ‘We’ll be reaching the peak of the propagation by Friday,’ and then ‘No, maybe mid-next week,’” she said. “And they were all systematically wrong, as we would have expected,” because no one was paying attention to the uncertainty in the projections, which caused wild shifts with every update to the data.

“It was clear to us,” Manrubia said, “that this was not something that you could just so carelessly say.” So she set out to characterize the uncertainty rooted in the intrinsically unpredictable system that everyone was trying to model, and to determine how that uncertainty escalated throughout the modeling process.

Manrubia and her team were able to fit their models very well to past data, accurately describing the transmission dynamics of COVID-19 throughout Spain. But when they attempted to predict what would happen next, their estimates diverged considerably, sometimes in entirely contradictory ways.

Manrubia’s group was discovering a depressing reality: The peak of an epidemic could never be estimated until it happened the same was true for the end of the epidemic. Work in other labs has similarly shown that attempting to predict plateaus in the epidemic curve over the long term is just as fruitless. One study found that researchers shouldn’t even try to estimate a peak or other landmark in a curve until the number of infections is two-thirds of the way there.

“People say, ‘I can reproduce the past therefore, I can predict the future,’” Manrubia said. But while “these models are very illustrative of the underlying dynamics … they have no predictive power.”

The consequences of the unpredictability of those peaks have been felt. Encouraged by what seemed like downturns in the COVID-19 numbers, many regions, cities and schools reopened too early.

Students Michelle Vu (left) and Klaudia Bak at Pennsylvania State University, after it reopened its campus last fall. School reopening plans across the country had to adjust to unexpected behaviors and events that had the potential to cause large outbreaks.

Ferrari and his colleagues at Penn State, for instance, had to confront that possibility when they started making projections in March about what August might look like, to inform their more granular planning models for bringing students back to campus. At the time, it seemed as if the first wave of infections would be past its peak and declining by the summer, so Ferrari and the rest of the modeling team assumed that their focus should be on implementing policies to head off a second wave when the students returned for the fall.

“And then the reality was, as we got closer and closer, all of a sudden we’re in June and we’re in July, and we’re all yelling, ‘Hey, the first wave’s not going to be over,’” Ferrari said. But the reopening plans were already in motion. Students were coming back to a campus where the risk might be much greater than anticipated — which left the team scrambling to find an adequate response.

Why Scientists Need To Be Better at Visualising Data

Imagine a science textbook without images. No charts, no graphs, no illustrations or diagrams with arrows and labels. The science would be a lot harder to understand.

That’s because humans are visual creatures by nature. People absorb information in graphic form that would elude them in words. Images are effective for all kinds of storytelling, especially when the story is complicated, as it so often is with science. Scientific visuals can be essential for analyzing data, communicating experimental results and even for making surprising discoveries.

Visualisations can reveal patterns, trends and connections in data that are difficult or impossible to find any other way, says Bang Wong, creative director of MIT’s Broad Institute. “Plotting the data allows us to see the underlying structure of the data that you wouldn’t otherwise see if you’re looking at a table.”

And yet few scientists take the same amount of care with visuals as they do with generating data or writing about it. The graphs and diagrams that accompany most scientific publications tend to be the last things researchers do, says data visualisation scientist Seán O’Donoghue. “Visualisation is seen as really just kind of an icing on the cake.”

As a result, science is littered with poor data visualisations that confound readers and can even mislead the scientists who make them. Deficient data visuals can reduce the quality and impede the progress of scientific research. And with more and more scientific images making their way into the news and onto social media – illustrating everything from climate change to disease outbreaks – the potential is high for bad visuals to impair public understanding of science.

The problem has become more acute with the ever-increasing amount and complexity of scientific data. Visualisation of those data – to understand as well as to share them – is more important than ever. Yet scientists receive very little visualisation training. “The community hasn’t by and large recognised that this is something that really is needed,” says O’Donoghue, of the University of New South Wales and lead author of a paper about biomedical data visualisation in the 2018 Annual Review of Biomedical Data Science.

There are signs of progress, however. At least two annual conferences dedicated to scientific data visualisation have sprung up in the last decade. And the journal Nature Methods ran a regular column from 2010 to 2016 about creating better figures and graphs, which was then adapted into guidelines for scientists submitting papers to that journal. But so far, few scientists are focusing on the problem.

Improving scientific visualisation will require better understanding of the strengths, weaknesses and biases of how the human brain perceives the world. Fortunately, research has begun to reveal how people read, and misread, different kinds of visualisations and which types of charts are most effective and easiest to decipher. Applying that knowledge should lead to better visual communication of science.

“We have a lot of practical knowledge about what works and what doesn’t,” says computer scientist Miriah Meyer of the University of Utah. “There are a lot of principles that have been through the test of time and have been shown over and over again to be really effective.”

Chart choice

The human visual system evolved to help us survive and thrive in the natural world, not to read graphs. Our brains interpret what our eyes see in ways that can help us find edible plants among the toxic varieties, spot prey animals and see reasonably well in both broad daylight and at night. By analyzing the information we receive through our eyes to serve these purposes, our brains give us a tailored perception of the world.

In the early 1980s, Bell Labs statisticians William Cleveland and Robert McGill began researching how the particulars of human perception affect our ability to decipher graphic displays of data – to discover which kinds of charts play to our strengths and which ones we struggle with. In a seminal paper published in 1984 in the Journal of the American Statistical Association, Cleveland and McGill presented a ranking of visual elements according to how easily and accurately people read them.

Their experiments showed that people are best at reading charts based on the lengths of bars or lines, such as in a standard bar chart. These visualisations are the best choice when it’s important to accurately discern small differences between values.

Study participants found it somewhat harder to judge differences in direction, angle and area. Figures using volume, curvature or shading to represent data were even tougher. And the least effective method of all was colour saturation.

“The ability of the audience to perceive minute differences is going to get worse and worse” as you move down the list, says computer scientist Jeffrey Heer of the University of Washington in Seattle. In general, it’s best practice to use the highest graphical element on the list that meets the needs of each type of data.

For example, if it’s important to show that one particular disease is far more lethal than others, a graphic using the sise of circles to represent the numbers of deaths will do fine. But to emphasise much smaller differences in the numbers of deaths among the less-lethal diseases, a bar chart will be far more effective.

In 2010, Heer used Amazon’s Mechanical Turk crowdsourcing service to confirm that Cleveland and McGill’s ranking holds true in the modern digital environment. Since then, Heer, O’Donoghue and others have used crowdsourcing to test many other aspects of visualisation to find out what works best. “That has huge power going forward to take this whole field and really give it a solid engineering basis,” O’Donoghue says.

Pernicious pies

Cleveland and McGill’s graphical ranking highlights why some popular types of figures aren’t very effective. A good example is the ever-popular pie chart, which has earned the disdain of data visualisation experts like Edward Tufte. In his influential 1983 treatise, The Visual Display of Quantitative Information, Tufte wrote that “the only design worse than a pie chart is several of them.”

Pie charts are often used to compare parts of a whole, a cognitively challenging visual task. The reader needs to judge either differences between the areas of the pie slices, or between the angles at the center of the chart: Both types of estimations are more difficult than discerning the difference in lengths of bars on a bar chart, which would be a better option in many instances.

Pie charts can be tempting because they are generally more attractive than bar charts, are easy to fill with colours and are simple to make. But they are rarely the best choice and are acceptable only in limited contexts. If the goal is to show that the parts add up to a whole, or to compare the parts with that whole (rather than comparing slices with each other), a well-executed pie chart might suffice as long as precision isn’t crucial.

For example, a pie chart that depicts how much each economic sector contributes to greenhouse gas emissions nicely shows that around half come from electricity and heat production along with agriculture, forestry and other land use. Transportation, which often gets the most attention, makes up a much smaller piece of the pie. Putting six bars next to each other in this case doesn’t immediately show that the parts add up to 100 percent or what proportion of the whole each bar represents.

In some scientific disciplines, the pie chart is simply standard practice for displaying specific types of data. And it’s hard to buck tradition. “There are certain areas in epigenetics where we have to show the pie chart,” says Wong, who works with biomedical scientists at the Broad Institute to create better visualisations. “I know the shortcomings of a pie chart, but it’s always been shown as a pie chart in every publication, so people hold on to that very tight.”

In other instances, the extra work pies ask of the human brain makes them poor vehicles for delivering accurate information or a coherent story.

Behind bars

Though bar graphs are easy to read and understand, that doesn’t mean they’re always the best choice. In some fields, such as psychology, medicine and physiology, bar graphs can often misrepresent the underlying data and mask important details.

“Bar graphs are something that you should use if you are visualising counts or proportions,” says Tracey Weissgerber, a physiologist at the Mayo Clinic in Rochester, Minnesota, who studies how research is done and reported. “But they’re not a very effective strategy for visualising continuous data.”

Weissgerber conducted a survey of top physiology journals in 2015 and found that some 85% of papers contained at least one bar graph representing continuous data – things like measurements of blood pressure or temperature where each sample can have any value within the relevant range. But bars representing continuous data can fail to show some significant information, such as how many samples are represented by each bar and whether there are subgroups within a bar.

For example, Weissgerber notes that the pregnancy complication preeclampsia can stem from problems with the mother or from problems with the baby or placenta. But within those groups are subgroups of patients who arrive at the same symptoms through different pathways. “We’re really focused on trying to understand and identify women with different subtypes of preeclampsia,” Weissgerber says. “And one of the problems with that is if we’re presenting all of our data in a bar graph, there are no subgroups in a bar graph.”

Bar charts are especially problematic for studies with small sample sizes, which are common in the basic biomedical sciences. Bars don’t show how small the sample sizes are, and outliers can have a big effect on the mean indicated by the height of a bar.

“One of the challenges is that in many areas of the basic biomedical sciences, bar graphs are just accepted as … how we show continuous data,” Weissgerber says.

There are several good alternative graphs for small continuous data sets. Scatterplots, box plots and histograms all reveal the distribution of the data, but they were rarely used in the papers Weissgerber analysed. To help correct this problem, she has developed tools to create simple scatterplots and various kinds of interactive graphs.

Ruinous rainbows

Colour can be very effective for highlighting different aspects of data and adding some life to scientific figures. But it’s also one of the easiest ways to go wrong. Human perception of colour isn’t straightforward, and most scientists don’t take the peculiarities of the visual system into account when choosing colours to represent their data.

One of the most common bad practices is using the rainbow colour scale. From geology to climatology to molecular biology, researchers gravitate toward mapping their data with the help of Roy G. Biv. But the rainbow palette has several serious drawbacks – and very little to recommend it.

Even though it’s derived from the natural light spectrum, the order of colours in the rainbow is not intuitive, says Wong. “You sort of have to think, is blue bigger than green? Is yellow larger than red?”

An even bigger problem is that the rainbow is perceived unevenly by the human brain. People see colour in terms of hue (such as red or blue), saturation (intensity of the colour) and lightness (how much white or black is mixed in). Human brains rely most heavily on lightness to interpret shapes and depth and therefore tend to see the brightest colours as representing peaks and darker colors as valleys. But the brightest colour in the rainbow is yellow, which is usually found somewhere in the middle of the scale, leading viewers to see high points in the wrong places.

Compounding the problem, the transitions between some colours appear gradual, while other changes seem much more abrupt. The underlying data, on the other hand, usually have a consistent rate of change that doesn’t match the perceived unevenness of the rainbow. “You can have perceptual boundaries where none exist and also hide boundaries that do exist,” says climate scientist Ed Hawkins of the University of Reading in England. Even scientists can fall prey to this illusion when interpreting their own data.

To avoid the rainbow problem, some researchers have come up with mathematically based palettes that better match the perceptual change in their colours to changes in the corresponding data. Some of these newer colour scales work specifically for people with red-green colour blindness, which is estimated to affect around 8 percent of men (but only a tiny fraction of women).

Though cartographers and a few scientists like Hawkins have been railing against the rainbow for decades, it remains pervasive in the scientific literature. Some fields of science have probably been using it ever since colour printing was invented. And because many scientists aren’t aware of the problematic aspects of the rainbow, they see no reason to defy tradition. “People are used to using it, so they like it, they feel comfortable with it,” Hawkins says.

This inclination is also encouraged by the fact that the rainbow colour scale is the default for much of the software scientists use to create visualisations. But Hawkins and others have been pushing software makers to change their defaults, with some success.

In 2014, MathWorks switched the default for the MATLAB software program to an improved colour scheme called parula. In 2015 a cognitive scientist and a data scientist developed a new default colour scheme called viridis for making plots with the popular Python programming language. And a new mathematically derived colour scheme called cividis has already been added to a dozen software libraries, though it is not yet the default on any of them.

Hazardous heat maps

One of the most interesting quirks of the human visual system – and one of the most nettlesome for data visualisation – is that our perception of a colour can be influenced by other nearby colours. In some cases the effect is quite dramatic, leading to all sorts of optical illusions.

Whenever a visualisation places different colours, or even shades of the same colour, next to each other, they can interact in unintended ways. The exact same colour will look entirely different when surrounded by a darker shade than it looks when surrounded by a lighter shade, a phenomenon known as simultaneous contrast. A well-known illustration of this, the checker shadow illusion, plays with the brain’s interpretation of colours when a shadow is cast across a checkered grid.

“The effect of colour interactions poses a huge problem,” Wong says. In the life sciences, one pervasive example is the heat map, which is often used to reveal relationships between two sets of data. “If you flip through a journal, a third of the figures are heat maps,” he says. “This is a very popular form of data visualisation that in fact is biasing scientific data.”

A heat map is a two-dimensional matrix, basically a table or grid, that uses colour for each square in the grid to represent the values of the underlying data. Lighter and darker shades of one or more hues indicate lower or higher values. Heat maps are especially popular for displaying data on gene activity, helping researchers identify patterns of genes that are more or less actively producing proteins (or other molecules) in different situations.

Heat maps are great for packing a ton of data into a compact display. But putting various shades of colours right next to each other can trigger the simultaneous contrast illusion. For example, a scientist comparing the colours of individual squares in the grid can easily misinterpret two different shades of orange as being the same – or think that two identical shades are quite different – depending on the colours of the surrounding squares.

“This is a huge problem in heat maps where you’re relying on a bunch of colour tiles sitting next to each other,” Wong says. “This unintentional bias is sort of rampant in every heat map.”

For gene activity data, green and red are often used to show which genes are more or less active. A particular shade of green can look very different surrounded by lighter shades of green compared with when it is surrounded by darker shades of green, or by red or black. The value that the shade of green is representing is the same, but it will appear higher or lower depending on its neighboring squares.

A blob of bright green squares in one part of the grid might mean that a gene is highly active in a group of closely related subspecies, say of bacteria. At the same time in another part of the grid, a single dull-green square surrounded by black squares may look bright, making it appear that the same gene is highly active in an unrelated bacterium species, when in fact it is only weakly active.

One way to mitigate the problem, Wong says, is to introduce some white space between parts of the grid, perhaps to separate groups of related species, groups of samples or sets of related genes. Breaking up the squares will reduce the interference from neighboring colours. Another solution is to use an entirely different display, such as a graph that uses lines to connect highly active genes, or a series of graphs that represent change in gene activity over time or between two experimental states.

Muddled messaging

Making sure a visualisation won’t mispresent data or mislead readers is essential in sharing scientific results. But it’s also important to consider whether a figure is truly drawing attention to the most relevant information and not distracting readers.

For example, the distribution of many data sets when plotted as a line graph or a histogram will have a bell shape with the bulk of the data near the center. “But often we care about what’s on the tails,” Wong says. For the viewer, “that’s often overwhelmed by this big old thing in the middle.”

The solution could be to use something other than height to represent the distribution of the data. One option is a bar code plot, which displays each value as a line. On this type of graph, it is easier to see details in areas of low concentration that tend to all but disappear on a bell curve.

Thoughtfully applied colour can also enhance and clarify a graphic’s message. On a scatterplot that uses different colours to identify categories of data, for instance, the most important information should be represented by the colours that stand out most. Graphing programs may just randomly assign red to the control group because it’s the first column of data, while the interesting mutant that is central to the findings ends up coloured gray.

“Pure colours are uncommon in nature, so limit them to highlight whatever is important in your graphics,” writes data visualisation journalist Alberto Cairo in his 2013 book The Functional Art. “Use subdued hues – grays, light blues and greens – for everything else.”

Besides the rainbow and simultaneous contrast, there are plenty of other ways to get into trouble with colour. Using too many colours can distract from a visualisation’s main message. Colours that are too similar to each other or to the background colour of an image can be hard to decipher.

Colours that go against cultural expectations can also affect how well a reader can understand a figure. On maps that show terrain, for example, the expectation is that vegetation is green, dry areas are brown, higher elevations are white, cities are gray, and of course water is blue. A map that doesn’t observe these well-established colour schemes would be much harder to read. Imagine a US electoral map with Democratic areas shown in red and Republican areas in blue, or a bar chart showing different causes of death in bright, cheery colours – the dissonance would make it harder to absorb their message.

If colour isn’t necessary, sometimes it’s safest to stick with shades of gray. As Tufte put it in his 1990 book Envisioning Information, “Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.”

Visualise the future

Many data visualisation problems persist because scientists simply aren’t aware of them or aren’t convinced that better figures are worth the extra effort, O’Donoghue says.

He’s been working to change this situation by initiating and chairing the annual Vizbi conference focused on visualising biological science, teaching a visualisation workshop for scientists, and combing the literature for evidence of the best and worst practices, which are compiled into his 2018 Annual Reviews paper. But overall, he says, the effort hasn’t gained a lot of momentum yet. “I think we’ve got a long ways to go.”

One reason for the lack of awareness is that most scientists don’t get any training in data visualisation. It’s rarely required of science graduate students, and most institutions don’t offer classes designed on scientific visualisation. For many students, particularly in the biomedical sciences, their only exposure to data visualisation is in statistics courses that aren’t tailored to their needs, Weissgerber says.

Scientists also tend to follow convention when it comes to how they display data, which perpetuates bad practices.

One way to combat the power of precedent is by incorporating better design principles into the tools scientists use to plot their data (such as the software tools that have already switched from the rainbow default to more perceptually even palettes). Most scientists aren’t going to learn better visualisation practices, O’Donoghue says, “but they’re going to use tools. And if those tools have better principles in them, then just by default they will [apply those].”

Scientific publishers could also help, he says. “I think the journals can play a role by setting standards.” Early-career scientists take their cues from more experienced colleagues and from published papers. Some journals, including PLoS Biology, eLife and Nature Biomedical Engineering have already responded to Weissgerber’s 2015 work on bar graphs. “In the time since the paper was published, a number of journals have changed their policies to ban or discourage the use of bar graphs for continuous data, particularly for small data sets,” she says.

With scientific data becoming increasingly complex, scientists will need to continue developing new kinds of visualisations to handle that complexity. To make those visualisations effective – for both scientists and the general public – data visualisation designers will have to apply the best research on humans’ visual processing in order to work with the brain, rather than against it.

Betsy Mason is a freelance journalist based in the San Francisco Bay Area who specialises in science and cartography. She is the coauthor, with Greg Miller, of All Over the Map: A Cartographic Odyssey (National Geographic, 2018).

This article originally appeared in Knowable Magazine, an independent journalistic endeavor from Annual Reviews.

New model predicts the peaks of the COVID-19 pandemic

As of late May, COVID-19 has killed more than 325,000 people around the world. Even though the worst seems to be over for countries like China and South Korea, public health experts warn that cases and fatalities will continue to surge in many parts of the world. Understanding how the disease evolves can help these countries prepare for an expected uptick in cases.

This week in the journal Frontiers in Physics, researchers describe a single function that accurately describes all existing available data on active cases and deaths -- and predicts forthcoming peaks. The tool uses q-statistics, a set of functions and probability distributions developed by Constantino Tsallis, a physicist and member of the Santa Fe Institute's external faculty. Tsallis worked on the new model together with Ugur Tirnakli, a physicist at Ege University, in Turkey.

"The formula works in all the countries in which we have tested," says Tsallis.

Neither physicist ever set out to model a global pandemic. But Tsallis says that when he saw the shape of published graphs representing China's daily active cases, he recognized shapes he'd seen before -- namely, in graphs he'd helped produce almost two decades ago to describe the behavior of the stock market.

"The shape was exactly the same," he says. For the financial data, the function described probabilities of stock exchanges for COVID-19, it described daily the number of active cases -- and fatalities -- as a function of time.

Modeling financial data and tracking a global pandemic may seem unrelated, but Tsallis says they have one important thing in common. "They're both complex systems," he says, "and in complex systems, this happens all the time." Disparate systems from a variety of fields -- biology, network theory, computer science, mathematics -- often reveal patterns that follow the same basic shapes and evolution.

The financial graph appeared in a 2004 volume co-edited by Tsallis and the late Nobelist Murray Gell-Mann. Tsallis developed q-statitics, also known as "Tsallis statistics," in the late 1980s as a generalization of Boltzmann-Gibbs statistics to complex systems.

In the new paper, Tsallis and Tirnakli used data from China, where the active case rate is thought to have peaked, to set the main parameters for the formula. Then, they applied it to other countries including France, Brazil, and the United Kingdom, and found that it matched the evolution of the active cases and fatality rates over time.

The model, says Tsallis, could be used to create useful tools like an app that updates in real-time with new available data, and can adjust its predictions accordingly. In addition, he thinks that it could be fine-tuned to fit future outbreaks as well.

"The functional form seems to be universal," he says, "Not just for this virus, but for the next one that might appear as well."