Jump to content
  • Major AI conference flooded with peer reviews written fully by AI

    aum

    • 719 views
    • 6 minutes
     Share


    • 719 views
    • 6 minutes

    Controversy has erupted after 21% of manuscript reviews for an international AI conference were found to be generated by artificial intelligence. 

     

    What can researchers do if they suspect that their manuscripts have been peer reviewed using artificial intelligence (AI)? Dozens of academics have raised concerns on social media about manuscripts and peer reviews submitted to the organizers of next year’s International Conference on Learning Representations (ICLR), an annual gathering of specialists in machine learning.

     

    Among other things, they flagged hallucinated citations and suspiciously long and vague feedback on their work.

     

    Graham Neubig, an AI researcher at Carnegie Mellon University in Pittsburgh, Pennsylvania, was one of those who received peer reviews that seemed to have been produced using large language models (LLMs). The reports, he says, were “very verbose with lots of bullet points” and requested analyses that were not “the standard statistical analyses that reviewers ask for in typical AI or machine-learning papers.”

     

    But Neubig needed help proving that the reports were AI-generated. So, he posted on X (formerly Twitter) and offered a reward for anyone who could scan all the conference submissions and their peer reviews for AI-generated text. The next day, he got a response from Max Spero, chief executive of Pangram Labs in New York City, which develops tools to detect AI-generated text.

     

    Pangram screened all 19,490 studies and 75,800 peer reviews submitted for ICLR 2026, which will take place in Rio de Janeiro, Brazil, in April. Neubig and more than 11,000 other AI researchers will be attending.

     

    Pangram’s analysis revealed that around 21% of the ICLR peer reviews were fully AI-generated, and more than half contained signs of AI use. The findings were posted online by Pangram Labs. “People were suspicious, but they didn’t have any concrete proof,” says Spero. “Over the course of 12 hours, we wrote some code to parse out all of the text content from these paper submissions,” he adds.

     

    The conference organizers say they will now use automated tools to assess whether submissions and peer reviews breached policies on using AI in submissions and peer reviews. This is the first time that the conference has faced this issue at scale, says Bharath Hariharan, a computer scientist at Cornell University in Ithaca, New York, and senior programme chair for ICLR 2026. “After we go through all this process … that will give us a better notion of trust.”


    AI-written peer review

     

    The Pangram team used one of its own tools, which predicts whether text is generated or edited by LLMs. Pangram’s analysis flagged 15,899 peer reviews that were fully AI-generated. But it also identified many manuscripts that had been submitted to the conference with suspected cases of AI-generated text: 199 manuscripts (1%) were found to be fully AI-generated; 61% of submissions were mostly human-written; but 9% contained more than 50% AI-generated text.

     

    Pangram described the model in a preprint1, which it submitted to ICLR 2026. Of the four peer reviews received for the manuscript, one was flagged as fully AI-generated and another as lightly AI-edited, the team’s analysis found.

     

    For many researchers who received peer reviews for their submissions to ICLR, the Pangram analysis confirmed what they had suspected. Desmond Elliott, a computer scientist at the University of Copenhagen, says that one of three reviews he received seemed to have missed “the point of the paper”. His PhD student who led the work suspected that the review was generated by LLMs, because it mentioned numerical results from the manuscript that were incorrect and contained odd expressions.


    When Pangram released its findings, Elliott adds, “the first thing I did was I typed in the title of our paper because I wanted to know whether my student’s gut instinct was correct”. The suspect peer review, which Pangram’s analysis flagged as fully AI-generated, gave the manuscript the lowest rating, leaving it “on the borderline between accept and reject”, says Elliott. “It's deeply frustrating”.


    Repercussions


    The ICLR 2026 team permitted authors and reviewers to use AI tools to polish text, generate experiment codes or analyse results, but mandated disclosure of such uses. It also prohibited AI use that would have breached the confidentiality of manuscripts or produced falsified content.


    The conference organizers will now use the Pangram analysis, as well as other automated tools, to assess whether submissions and reviews breached these policies, and will penalize authors and reviewers who violated them.


    Researchers who oversee the peer-review process “are going to be asked to flag poor-quality reviews, not just reviews that are generated by LLMs,” says Hariharan. He adds that the “bar for desk-rejecting reviewers is going to be high. Given that these automated tools may have false positives, we are not going to be relying completely on those.”

     

    Some authors have withdrawn their ICLR submissions because peer reviews of the manuscripts contained false claims. Others are still wondering how to respond to the peer reviews they received. “As a scientist, I’ve been in this game long enough that I know I’m going to get some low-quality reviews when we submit work to conferences,” says Elliott. But the suspected AI-generated reviews tend to contain “a lot of content”, he adds. Some of this “is relevant and worth responding to, but other parts don’t make sense”.


    The situation at ICLR 2026 highlights the mounting pressure on peer reviewers to keep pace with a fast-growing field. “In AI and machine learning right now, we have a crisis in terms of reviewing, because the field has expanded exponentially for the past five years,” says Neubig.


    Hariharan says each ICLR reviewer was assigned five papers that they had to review in two weeks, on average. “That is a very significant load. It is much higher than what has been done in the past.” He says discussions are taking place on how to manage this. “Everyone in the community is aware that we are in a regime where all of us are doing significantly more volunteer work than we used to.”

     

    Source


    User Feedback

    Recommended Comments

    There are no comments to display.



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...