Several AI tools aim to summarize scientific findings to help researchers.

Credit: Dimitri Otis/Getty

As large language models (LLMs) gallop ever onwards — including GPT-4, OpenAI’s latest incarnation of the technology behind ChatGPT — scientists are beginning to make use of their power. The explosion of tools powered by artificial intelligence (AI) includes several search engines that aim to make it easier for researchers to grasp seminal scientific papers or summarize a field’s major findings. Their developers claim the apps will democratize and streamline access to research.

But some tools need more refinement before researchers can use them to help their studies, say scientists who have experimented with them. Clémentine Fourrier is a Paris-based researcher who evaluates LLMs at Hugging Face, a company in New York City that develops open-source AI platforms. She used an AI search engine called Elicit, which uses an LLM to craft its answers, to help find papers for her PhD thesis. Elicit searches papers in the Semantic Scholar database ad identifies the top studies by comparing the papers’ titles and abstracts with the search question.

Variable success

Fourrier says that, in her experience, Elicit didn’t always pick the most relevant papers. The tool is good for suggesting papers “that you probably wouldn’t have looked at”, she says. But its paper summaries are “useless”, and “it’s also going to suggest a lot of things that are not directly relevant”, she adds. “It’s very likely that you’re going to make a lot of mistakes if you only use this.”

RELATED

Jungwon Byun, chief operating officer at Ought, the company in San Francisco, California, that built Elicit, says: “We currently have hundreds of thousands of users with diverse specializations so Elicit will inevitably be weaker at some queries.” The platform works differently from other search engines, says Byun, because it focuses less on keyword match, citation count and recency. But users can filter for those things.

Other researchers have had more positive experiences with the tool. “Elicit.org is by far my favourite for search,” says Aaron Tay, a librarian at Singapore Management University. “It is close to displacing Google Scholar as my first go-to search for academic search,” he says. “In terms of relevancy, I had the opposite experience [to Fourrier] with Elicit. I normally get roughly the same relevancy as Google Scholar — but once in a while, it interprets my search query better.”

These discrepancies might be field-dependent, Tay suggests. Fourrier adds that, in her research area, time is critical. “A year in machine learning is a century in any other field,” she says. “Anything prior to five years is completely irrelevant,” and Elicit doesn’t pick up on this, she adds.

Full-text search

Another tool, scite, whose developers are based in New York City, uses an LLM to organize and add context to paper citations — including where, when and how a paper is cited by another paper. Whereas ChatGPT is notorious for ‘hallucinations’ — inventing references that don’t exist — scite and its ‘Assistant’ tool remove that headache, says scite chief executive Josh Nicholson. “The big differentiator here is that we’re taking that output from ChatGPT, searching that against our database, and then matching that semantically against real references.” Nicholson says that scite has partnered with more than 30 scholarly publishers including major firms such as Wiley and the American Chemical Society and has signed a number of indexing agreements — giving the tool access to the full text of millions of scholarly articles.

RELATED

Nicholson says that scite is also collaborating with Consensus — a tool that “uses AI to extract and distill findings” directly from research — launched in 2022 by programmers Eric Olson and Christian Salem, both in Boston, Massachusetts. Consensus was built for someone who’s not an expert in what they’re searching for, says Salem. “But we actually have a lot of researchers and scientists using the product,” he adds.

Like Elicit, Consensus uses Semantic Scholar data. “We have a database of 100-million-plus claims that we’ve extracted from papers. And then when you do a search, you’re actually searching over those claims,” says Olson. Consensus staff manually flag contentious or disproven claims — for example, that vaccines cause autism, says Olson. “We want to get to a state where all of that is automated,” says Salem, “reproducing what an expert in this field would do to detect some shoddy research.”

Room for improvement

Meghan Azad, a child-health paediatrician at the University of Manitoba in Winnipeg, Canada, asked Consensus whether vaccines cause autism, and was unconvinced by the results, which said that 70% of research says vaccines do not cause autism. “One of the citations was about ‘do parents believe vaccines cause autism?’, and it was using that to calculate its consensus. That’s not a research study giving evidence, yes or no, it’s just asking what people believe.”

Mushtaq Bilal, a postdoc at the University of Southern Denmark in Odense, tests AI tools and tweets about how to get the most out of them. He likes Elicit, and has looked at Consensus. “What they’re trying to do is very useful. If you have a yes/no question, it will give you a consensus, based on academic research,” he says. “It gives me a list of the articles that it ran through to arrive at this particular consensus,” Bilal explains.

Azad sees a role for AI search engines in academic research in future, for example replacing the months of work and resources required to pull together a systematic review. But for now, “I’m not sure how much I can trust them. So I’m just playing around,” she says.