There were two stages of assessments and no interview. The first established guidelines, and the second was a longer application of those expectations. They both involved comparing two or more model responses across different metrics and fact-checking them.