Peace has finally been brokered in a long-standing argument between two schools of thought in statistical science.
Research from Deep Mukhopadhyay, professor of statistical science, and Douglas Fletcher, a PhD student, was accepted for publication in Scientific Reports, a journal by Nature Research. Their research marks a significant step towards bridging the “gap” between two different schools of thought in statistical data modeling that has plagued statisticians for over 250 years.
“There are two branches of statistics: Bayesian and Frequentist,” says Mukhopadhyay. “There is a deep-seeded division, conceptually and operationally, between them.” The fundamental difference is the way they process and analyze the data. Bayesian statistics incorporates external domain-knowledge into data analysis via so-called “prior” distribution.
Subhadeep Mukhopadhyay
“Frequentists view ‘prior’ as a weakness that can hamper scientific objectivity and can corrupt the final statistical inference,” says Mukhopadhyay. “I could come up with ten different kinds of ‘prior’ if I asked ten different experts. Bayesians, however, view it as a strength to include relevant domain-knowledge into the data analysis.” This has been a disagreement in statistics over the last 250 years.
So, which camp is right? “In fact, both are absolutely right,” says Mukhopadhyay. In their paper, they argued that a better question to ask is, how can we develop a mechanism that incorporates relevant expert-knowledge without sacrificing the scientific objectivity?
The answer, Mukhopadhyay says, can ultimately help design artificial intelligence capable of simultaneously learning from both data and expert knowledge—a holy grail problem of 21st Century statistics and AI.
“The science of data analysis must include domain experts’ prior scientific knowledge in a systematic and principled manner,” Mukhopadhyay says. Their paper presents Statistical rules to judiciously blend data with domain-knowledge, developing a dependable and defensible workflow.
“That is where our breakthrough lies,” says Mukhopadhyay. “It creates a much more refined ‘prior,’ which incorporates the scientist’s knowledge and respects the data, so it’s a compromise between your domain expertise and what the data is telling me.”
Answering that question—when and how much to believe prior knowledge—offers dozens of real-world applications for Mukhopadhyay’s work. For example, healthcare companies can use apply this to new drugs by leveraging doctors’ expertise without being accused of cherry picking data for the sake of a speedy or unusually successful clinical trial.
Mukhopadhyay thanks Brad Efron of Stanford University, for inspiring him to investigate this problem. “It took me one and a half years to come up with the right question,” says Mukhopadhyay. “I believe Bayes and Frequentist could be a winning combination that is more effective than either of the two separately in this data science era.”
*This article corrects an earlier version by specifying that the research was published in Scientific Reports, a journal by Nature Research.