Candidates are the number of candidates. Click on the number of Candidates to jump to the Candidate Feedback screen.


Decisions are the number of decisions. Click on the number of Decisions to jump to the Judge Feedback screen.


Reliability is a measure of consistency of the judging. Reliability goes from 0 to 1. 0 represents complete inconsistency or all noise, while 1 represents complete consistency between all judges. Values above 0.7 are typically high enough for low stakes tests, while high stakes tests should aim for a value above 0.9. Higher values can typically be achieved with more judgements, but there are diminishing returns.


Inter-rater is a correlation between two randomly selected groups of judges. Judges are split into two random halves, and the correlations between the True Scores created from each group calculated. The process is repeated four times and the average and the standard deviation calculated. In the example above the correlation is 0.75 and the standard deviations in 0.03. You need to judge each script around 20 times to get a stable inter-rater correlation. 


Created is the time and date a task was created.  


Refresh is the button to click to generate Candidate scores.


Comments are the comments made by judges. Click on the number of comments to see the judges' comments.


Remove deletes the task.