LivingKnowledge goal is to bring a new quality into search and knowledge management technology for more concise, complete and contextualised search results.
The paper “Repeatable and Reliable Search System Evaluation using Crowdsourcing” co-written by R. Blanco, H. Halpin, D. M. Herzig, P. Mika, J. Pound and H. S. Thompson has been presented in the 34th Annual ACM SIGIR Conference SIGIR 2011, in Beijing, China, on July 25-29, 2011.
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the rst large-scale evaluation campaign that specically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over dierent evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.