Open AI’s critical writing model for describing flaws in summaries can be used to tailor monitoring of machine learning systems to tasks that are difficult for humans to assess directly

Future AI systems that will perform extremely complex tasks will be driven by human purpose. Models can then learn to provide outputs that look human-friendly but contain overlooked flaws. On the other hand, humans have a hard time evaluating difficult tasks, like finding every problem in a codebase or every factual error in a long essay. Several previous efforts to align language models have used human ratings as a training signal.
It takes a lot of effort to read an entire book, but humans who receive chapter summaries have an easier time parsing a book summary. To solve this problem, the researchers intend to teach AI assistants to help people provide feedback on complex jobs. These assistants must point out faults, explain what is happening and answer people’s questions.
The researchers put together their dataset of more than 6,000 inquiries and summaries on more than 2,000 different texts as proof of concept. The sections are drawn from a database of short stories, Wikipedia articles, and web articles (mostly news) collected from the Internet. When encoding using the GPT-2 tokenizer, most jobs were generated from short texts of less than 2048 characters. They’ve also collected non-training, text-based challenges with up to 4,096 tokens. These models will aid human evaluators and will be used to study the scale characteristics of critical writing.
Experiments with the help of artificial intelligence
Free 2 Minute AI NewsletterJoin over 500,000 AI people
They show labels eight reviews written per model of each summary, with a control group that receives no help, to assess the value of the models for evaluation assistance. They use plans based on subjects from three different sources: written by their models, humans, and people with significant but minor flaws.
The researchers also found that, unlike tiny models, huge models could directly improve their results by applying self-criticism. Models that receive better reviews make more progress than those that receive poor or no reviews.
Do the models share everything they know with us?
Researchers want models to communicate all the problems they “know” about in order to provide the best evaluation support for difficult missions. Can a model give honest feedback that people understand whenever it accurately predicts that an answer is incorrect?
This is especially crucial when supervising models who may attempt to deceive human supervisors or hide facts. They want to train equally intelligent helping models to identify what humans are missing.
Unfortunately, the researchers found that the models are better at discriminating than analyzing their responses, implying that they are aware of some issues that they cannot or will not communicate. Also, for larger models, the difference between Discrimination and Crit abilities doesn’t seem narrow. For alignment research, closing this gap is a primary goal.
Topic-based synthesis is the basic task that is the same or interchangeable with query-based and question-driven synthesis. Instead of attempting to summarize the entire text, the topical summary focuses on a particular aspect of the text.
Next steps
A significant drawback of this study is that topical summarization is not a difficult task: humans understand it well and can evaluate a summary in about 10 minutes. They deal with professions that are much more difficult for humans to assess to better understand the limits of AI-assisted assessment.
Nevertheless, these results give us hope that they might be able to train models to provide users with user feedback. Starting with work on the discussion and recursive modeling of rewards is a fundamental pillar of the alignment method. In the long term, the researchers hope to create trustworthy assistants to do all the cognitive work associated with evaluation, allowing humans to focus on communicating their preferences.
This Article is written as a summary article by Marktechpost Staff based on the paper 'Self-critiquing models for assisting human evaluators'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, dataset and blog post. Please Don't Forget To Join Our ML Subreddit