FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE

The Messidor database, which contains hundreds of eye fundus images, has been publicly distributed since 2008. It was created by the Messidor project in order to evaluate automatic lesion segmentation and diabetic retinopathy grading methods. Designing, producing and maintaining such a database entails significant costs. By publicly sharing it, one hopes to bring a valuable resource to the public research community. However, the real interest and benefit of the research community is not easy to quantify. We analyse here the feedback on the Messidor database, after more than 6 years of diffusion. This analysis should apply to other similar research databases.


INTRODUCTION
Public databases are precious tools for researchers.They bring the necessary data to develop and test new methods, and allow for quantitative comparisons between different approaches.The Messidor database is one of such databases.It was created within the Messidor project to evaluate different lesion segmentation methods for color eye fundus images, in the framework of diabetic retinopathy screening and diagnosis.It has been publicly distributed since 2008.

THE MESSIDOR DATABASE
The Messidor download page1 gives an appropriate description of the database, which we quote here: "The 1200 eye fundus color numerical images of the posterior pole for the MESSIDOR database were acquired by 3 ophthalmologic departments using a color video 3CCD camera on a Topcon TRC NW6 non-mydriatic retinograph with a 45 degree field of view.The images were captured using 8 bits per color plane at 1440*960, 2240*1488 or 2304*1536 pixels.800 images were acquired with pupil dilation (one drop of Tropicamide at 0.5%) and 400 without dilation.
The 1200 images are packaged in 3 sets, one per ophthalmologic department.Each set is divided into 4 zipped sub sets containing each 100 images in TIFF format and an Excel file with medical diagnoses for each image."Note that, as the description indicates, the database contains a medical diagnosis for each image, but no manual annotations on the images, such as lesions contours or position.This is an important difference with respect to other databases, such as DIARETDB1 and e-ophtha.
The download procedure asks the user to fillin the following fields: E-mail address; First Name; Last Name; Professional Interests; Country and University/Organization.An e-mail is then sent to a member of the Messidor team, who checks the validity of the request, and sends an appropriate link to the submitter.Some requests are not accepted, typically because the fields requested in the download procedure are clearly incorrectly filled.Precise statistics on refused requests are not kept, but we estimate that they represent less than 25% of the total number of requests.They are not taken into account in the statistics below.
It should be noted that Messidor database users are asked to acknowledge the Messidor project partners in their related publications.

EXPERIENCE FEEDBACK ON MESSIDOR
Most of the statistics on the Messidor database diffusion presented in this section are summarized in Fig. 1.People tend to underestimate support and maintenance costs associated to a publicly distributed database.For instance, given the increasing number of download requests for the Messidor database, processing these requests and related questions requires approximately one hour per week.On top of that, users ask general questions about the database -even if most answers are available in the website.Finally, hosting the database and web pages also takes resources.Another measure on the success of the database can be obtained through access statistics to the corresponding web page (see Table 2).Again, one can see a clear increase in web site access since 2008.The number of visitors is approximately two times higher in 2013 than in 2011.This trend clearly appears in Fig. 1.The link between download requests or web access and the actual contribution to the research domain is not necessarily simple to apprehend.Indeed, people might download the database or consult the web site Image Anal Stereol ?? (Please use \volume):1-4 for reasons not related to public research.In order to clarify this point, we have looked into the number of citations of the Messidor database in scientific papers.The results are summarized in table 3. Interestingly, it can be seen that the Messidor database has been cited three times more often in 2013 than in 2011 -the same increase as for the number of download requests (see Fig. 1).Finally, if we pool the results for two of the most cited journals in the field of biomedical image processing, that is Medical Image Analysis and IEEE Transactions of Medical Imaging, we find that, since 2008, 47 papers deal with "diabetic retinopathy", and among these 10 papers cite the Messidor database.
Note that other databases used in the same domain follow similar trends.DIARETDB1, which has been distributed since 2007, has been cited 295 times (as of June 19, 2014), while HEI-MED, which was established in 2012, 26 times.

CONCLUSION
The Messidor database has been publicly distributed since 2008.It is of interest mainly for researchers in a relatively specialized domain: retinal image processing, and more specifically computerassisted diagnosis of diabetic retinopathy.In spite of this, it has gathered a large amount of citations.We have also shown that the number of web site visitors, as well as the number of download requests, seem to be correctly correlated with the number of citations, which provides a simple and convenient method to monitor the success of a database.
The experience gathered by our team on the management of the Messidor database allows us to propose some recommendations for the design of future databases: -Hosting and managing the database takes resources; this point should be taken into account during the database design, in order to reduce this cost as much as possible.
-The database is typically described on a web page.This description has to be clear and complete, in order to limit the number of requests for additional information (and therefore to reduce the management cost).
-The database managers should ask potential users to acknowledge the database or, better, to cite a relevant paper on the database.This simplifies the evaluation of the success of the database.
-Last but not least, we have shown that an automatic validation procedure seems to be enough to treat download requests.
Moreover, we believe that this study confirms the important role that databases play in medical image processing.In the case of the Messidor database, this is true in spite of the fact that the images contained in the database are progressively getting outdated.Indeed, they were acquired before 2007, and modern fundus cameras offer increasing image resolutions and sensitivities.As far as we know, only two databases have been released in this field after 2010: HEI-MED (for exudate-based macula oedema detection) and e-ophtha (microaneurysms and exudates segmentation).This stresses the importance of new databases, corresponding to the current clinical practice.

Fig. 1 .
Fig. 1.Evolution of number of citations, web site visitors and dowload requests.

Table 1
gives the number of download requests per year between 2011 and 2013, broken into different countries.It can be seen that download requests clearly increase over time: there have approximately been three time more requests in 2013 than in 2011.This increase comes mainly from less developped countries.

Table 1 .
Download requests for the Messidor database, per year.Some countries, where only few requests originated, are not indicated.

Table 3 .
Citations per year.Values were obtained through Google Scholar using the keywords "Messidor diabetic retinopathy" on June 19, 2014.