Jump to content
  • OpenAI develops LLM that uses a chain of thought like humans

    alf9872000

    • 379 views
    • 2 minutes
     Share


    • 379 views
    • 2 minutes

    OpenAI has released a new paper outlining some advancements it has made in eliminating the common problem of hallucinations where AI just makes stuff up. The paper outlines two models called outcome supervision and process supervision to weed out hallucinations and how they perform.

     

    With outcome supervision, OpenAI trains reward models to provide feedback on the final result the AI gives. With process supervision, the reward model provides feedback at every step of the way, creating a human-like chain of thought.

     

    In its research paper, OpenAI tested both models on a math dataset and found that the process supervision method led to “significantly better performance”. It’s important to note that the process supervision method has only been tested in the area of mathematics so far and that it will take more work to see how it performs more generally.

     
    1685569243_gpt-math-problems_story.jpg

    Explaining the possible outcomes of the process supervision method, OpenAI said:

     

    “If these results generalize, we may find that process supervision gives us the best of both worlds – a method that is both more performant and more aligned than outcome supervision.”

     

    It’s still too early to say how much this step-by-step verification will help to address hallucinations more generally, but hopefully, it will because hallucinations are probably the number one issue with LLMs right now. Just this week, a lawyer that had used ChatGPT for his work and submitted false information detailing fake cases that the AI had dreamt up.

     

    OpenAI has not given a timeline for how long it will take to implement process supervision in ChatGPT which is available to the public. It’s still in the research phase and needs to be tested on general information.

     

    While initial results are good, OpenAI does mention that safer methods can incur reduced performance called an alignment tax. The results show so far that process supervision doesn’t incur this tax while working on math problems but we don’t know what will happen on more general information.

     

    Source


    User Feedback

    Recommended Comments

    There are no comments to display.



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...