The Nature Trail Rabbitry BLOG
Friday, December 15, 2006
  Testing Theory and Judging
If you've been reading my BLOG for a while, you may remember that I was once a personnel analyst and then later a consultant in the area of research and development. In plain English, I used to research jobs, develop tests, and then validate the tests to provide a legal defensibility for the test.

So, I had to be familiar with testing theory. Now I'm not going to be talking about the test that judges take or the registrar's test. You see, judging is also a test. Another word for test is evaluation. And testing theory covers written and performance tests and other types of evaluations (of experience and education in the world of personnel).

In our department, we had trained evaluators who reviewed resumes for qualifications and ranked the applicants according to their relative value for a job opening. Applicants below the minimum qualifications did not make the list.

In theory, if the evaluators (judges) were perfectly trained, the rating guide (which would be equivalent to our standard) were perfectly written, and the guide were perfectly applied, all evaluators would come out with the same result.

But, people being people, all written documents being imperfect, and judgment being involved in applying the guide, there is invariably some differences from one "test" to another.

I've heard judges say that they are being paid for their opinion and it doesn't matter to them whether anyone else agrees with it. Well, I think to survive as a judge you have to have a thick skin and a healthy ego. If that statement comes from those two things, then I guess it's just fine. If the statement comes from a defensiveness that takes the place of knowledge and experience, it's a problem, as is an inflated ego.

You see, there are a couple of other parts to testing theory that are really important. They are inter-rater and intra-rater reliability. Remember the perfect scenario above? In the perfect world, all of the raters (or evaluators or judges) would rate a subject exactly the same. We know it's not perfect, but the extent to which they are the same should be high, if they are well trained, using the same standard, and applying it well. That's inter-rater reliability.

Intra-rater reliability says that if the same evaluator rates the same subject multiple times, then the evaluation would come out exactly the same. Now here's word of caution: with a resume, it can be a photocopy of the original and the subject is exactly the same. But the same rabbit is not the same rabbit on a different day. First, we have a competition against other rabbits, not just a points scale. Perhaps a judge is totally consistent and mentally gives a rabbit 93 points at two different shows. That could be a first in one show and a third in another. The other difference you are too painfully aware of: condition is not static; it is constantly changing, even within a day or weekend.

So, should a judge care whether his or her placements are consistent with other judges? Yes and no. Why is that? Well, it takes a bit more testing theory to explain.

We used subject matter experts to help us evaluate our test questions. We tested the test. The theory is that people who answer questions like the people who do best on the subject are the ones we want to choose. Those are the people who should do better on an evaluation. Now realize, we're turning the tables and evaluating the judges a bit here, not judging rabbits.

So, if the majority of judges would pick rabbit A over rabbit B, but this certain judge picks B over A, is that a bad thing for the judge? Maybe. It depends on why there is a difference.

We used to write questions at various levels. Some questions everybody got right. We'd say that those questions did not discriminate - that is, they didn't tell us anything about the job applicant. Certain calls, every judge would make the same. Don't believe me? Put a three-legged rabbit on the table and see what happens. Or, put a Flemish Giant up with the Netherland Dwarfs. I guarantee you 100% of the time, the judge will do the same thing.

Some questions did a fairly good job of discriminating. We would look at the top, middle, and bottom third of responses to the subject matter. That is, the folks who got the most questions right on the test would be called the "top third, " the ones that got the average number of questions right formed the "middle third," and those that did the poorest on the test overall were the "bottom third." In those questions that did a pretty good job of discriminating (telling us who was best), most or all of the top third would get the question right, some of the middle third would get the question right, and few of the bottom third would get the question right.

Think of a rabbit that is a pretty good example of its breed and it's competing against rabbits that are definitely not of its caliber. All of the top judges will place that rabbit correctly. Most of the middle level of judges will get it right. Even a few poorer judges will still figure out which is the best rabbit on the table. That situation is a test that tells us something about the judges' abilities. In this case, the judge would definitely want to be placing the rabbit like the majority.

Sometimes when we were writing test questions, we'd write a really, really good one that discriminated well. That is, if you got this one right, you really were more likely to be good at the subject. The top group mostly got it, few in the middle got it right, and virtually none of the bottom group got it.

With rabbits, that's when there are several nice specimens and the judge has to really be able to know and apply the standard well. In this case, the judge wants to place rabbits, not so much like the majority of judges, but more like the majority of judges skilled in this particular breed.

The last type of question that we wrote was usually problematic - it really, really told the tale. It could be a bit of an enigma for us. That's because we inadvertently hit some common misperception. In this case, only a portion of the top group got the real answer and everyone else chose the common misperception. If we didn't know the real answer to the question, we might assume that we miskeyed the question! For example, most respondents like answer B. Only a few of the really good test takers chose the right answer, A. It screws up our statistics and makes folks scratch their heads.

In the rabbit world, that's when little known DQs come in to play (proper undercolors, presence/absence of rings, martenizing, malformed skeleton, etc.). Or, that's when the very detailed knowledge of each breed comes in - depth over the hindquarter, proper head mount, details in markings - it would be different for every breed. Only a few judges really skilled in these breeds catch these fine points. Often, it is only the judge who has successfully raised the breed that doesn't fall for some common misperception, who really understands the breed down to the minutiae. It can leave the onlookers scratching their heads. "Everyone else liked this rabbit - why not you?"

In this case, a judge definitely does not want to be placing rabbits like the majority. In this case, the judge wants to be in the small minority. However, it can be difficult to tell, when you are in the minority, whether you are in it for the right reason or the wrong reason. If the answer is A and the common misperception is B, you could still be selecting C or D.

So, should the judge care about how he or she stacks up with other judges? Yes. And no. He or she should not be placing multi-leg, nationally competitive rabbits in good condition at the bottom of the table (unless he or she knows something few others know). He or she's not showing inter-rater reliability in this case. The judge should be interested in whether they are placing certain breeds similar to those judges who are outstanding in those breeds. There's no reason to strive to match the mediocrity of the masses. A judge who has an expertise in a breed should not be in step with the multitude of judges who just have a general understanding of a breed.

A judge should place the same rabbit or same type rabbit similarly each time. The actual placements should change based on the competition on the table, but you shouldn't see the rabbit (in good condition) flopping from the top of the table one month to the bottom of the table the next - and back again. Sure, it's not hard to miss a good rabbit from time to time, but there are judges that every group sees often enough to get a feel for the judge's consistency.

So, what does this mean and what can we do about it? Personally, I think it is appropriate AFTER the judging is complete to chat with a judge about a rabbit's career. Especially with judges who are new or who specialize in other breeds, I think they need some feedback. If they pick a rabbit that is nationally competitive, they should be told, so they can get a feel for what their inter-rater reliability is. It lets them know they are on the right track. If, on the other hand, they place a really nice rabbit at the bottom of the class, it's a much touchier situation. You might ask the judge - again AFTER the judging - to go over a rabbit again with you. It might be that he or she needs to see you pose the rabbit properly. It could be that the judge needs a gentle reminder that Holland ears are to be evaluated on a calm rabbit. It could be the judge just missed it - as happens on a regular basis and is to be expected.

I admit, this is a slippery slope. There is the possibility that certain breeders end up training judges in their particular spin on their breed. It's really, really important that judges get input from many sources.

So, judges judge and are testing our rabbits, giving their evaluation of them. In theory, in a perfect world, every judge would apply the standard perfectly every time. But also, judges are evaluated for their inter- and intra-rater reliability. They need feedback for that as well.

Laurie Stroupe
The Nature Trail Rabbitry “Home Of Grand Champions”
Precious Pet Rabbits
Affordable Web Site Design
 
Comments: Post a Comment

Links to this post:

Create a Link



<< Home
Holland lop BLOG about daily life in my rabbitry. I share show results, my daily routine as I provide rabbit care, my challenges as a rabbit breeder, and my successes as my show rabbits develop.

My Photo
Name: Laurie Stroupe
Location: Ararat, Virginia, United States

I am, if nothing else, a busy woman. But I've filled my life with people, activities, and things I love, so I wouldn't change a thing! My list of favorite things include my husband Andrew, our four children, my Holland lop show rabbits, our long coat Chihuahuas, ballroom dancing, and my cobalt glassware, gifts, and accessories business.

ARCHIVES
July 2005 / August 2005 / September 2005 / October 2005 / November 2005 / December 2005 / January 2006 / February 2006 / March 2006 / April 2006 / May 2006 / June 2006 / July 2006 / August 2006 / September 2006 / October 2006 / November 2006 / December 2006 / January 2007 / February 2007 / March 2007 / April 2007 / May 2007 / June 2007 / July 2007 / August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 /


Laurie's Cobalt World
Laurie's Cobalt World BLOG
Laurie's Cobalt World Newsletter
Home
About The Nature Trail
My Holland Lops
Pet Holland Lops
Showing Holland Lops
Rabbit Genetics
For Breeders
Store
Precious Pet Rabbits

 

 

 

 

 

 


Add to My Yahoo!

Powered by Blogger






This website is owned and maintained by Laurie Stroupe of The Nature Trail Rabbitry. Copyright 2005 The Nature Trail Rabbitry. No portion may be used without written permission.