Herbarium Digitisation: Is 600dpi Evil?

I have been doing some thinking about capturing images of herbarium specimens so as to facilitate the “taxonomic process” – whatever that might be. The trigger for writing this down was a quote from an excellent series of papers on digitisation of specimens:

“Plant sheets are usually scanned at somewhere around 1000 DPI (600 DPI being now generally considered the absolute minimum requirement), which renders images in the hundred-megabyte range.” Ariño and Galicia (2005) in Christoph L. Häuser, Axel Steiner, Joachim Holstein & Malcolm J. Scoble. (eds) (2005) Digital Imaging of Biological Type Specimens: A Manual of Best Practice. ENBI, Stuttgart

If you take a picture of a herbarium specimen with a modern (10 to 20 megapixel) digital SLR you will get an image that is around 300 dots per inch (dpi) measured on the specimen. This is relatively cheap and easy to do. To capture images above this resolution requires very expensive cameras or flat bed scanners suspended upside down on special rigs or something equally complex. More importantly capturing images above this stepping point in resolution slows down the capture process enormously – so that fewer specimens are imaged. This simple requirement of 600+ dpi is actually a hurdle to the digitisation of herbaria so there must be a good reason for it. I am not so sure that there is and here I explain why.

What Botanists Actually Do

Take a look at a botanist working in a herbarium. Their work can be broken down into six phases. What resolution images would it take to carry out these stages in a virtual way with all the advantages of digital specimens?

Discovery Find that the specimens exist and where they are located. This is largely text based. It requires the filing name and geographic origin data to be captured in the herbarium catalogue. Text capture is a side effect of imaging the specimen with its own inherent problems but does not require images above 300dpi.
Retrieval Gain physical access to the specimens by removing folders from cabinets and placing them on a work bench or requesting a loan from a separate herbarium. This is easier with lower resolution images but can be achieved with any resolution image. Just about any resolution above 100dpi will require some form of zooming/thumb-nailing of images for manipulation in an interface so 300 dpi would be fine 600+ requires more resources but is equally OK.
Selection Each specimen is looked at it turn. This is typically little more than a glance at a distance of approximately twice normal reading distance (>700mm). The botanist is getting the gist of the specimen. Some specimens are selected to be examined. The selection criteria may be based on label information or whether the specimen appears to contain suitable material e.g. whether it is fruiting or flowering.
Examination If a specimen is selected in step three then it is examined in more detail. It may be picked up and held closer to the face at about a reading distance of 350mm. Measurements down to 0.5mm may be taken using a rule.
Detailed Examination The botanist may use a hand lens or long arm binocular microscope to examine parts of the specimen at 10x to 60x magnification. Depending on the taxonomic group this stage may come very quickly after stage 4.
Further Study The capsule may be opened and contents examined. Parts of the specimen may be removed, boiled, dissected and returned to the capsule. No resolution of image will permit this activity!

Of these six stages the resolution is only pertinent to phases 3, 4 and 5 – so what resolution is require for these stages?

Visual Acuity and DPI

Visual acuity is the ability to see clearly. (Here is the Wikipedia page to save you searching for it). Some one with good normal vision (20/20) can distinguish two lines when the angle of view subtended at the eye is 1 arc minute (1/60th of a degree or 0.016667 degrees). This is under ideal, high contrast conditions where the lines are vertical or horizontal. Under other conditions discrimination will be worse. The nearer they are to the subject (down to the minimum distance they can focus) the smaller the things they can see – somewhat obvious – but here we have a rule of thumb for what normal people can distinguish and we can use good old high school trigonometry to calculate what they should be able to see at different viewing distances. How does this relate to dpi?

When a digital imaging device is capturing the real world you can think of it as sampling a surface at regular intervals – like placing a piece of graph paper over the subject and recording the colour under each square. A higher resolution is a finer gridded graph paper. You may be tempted to think we can estimate the size of grid squares on the basis of the angle subtended and we can but a fudge factor is needed. Because the camera is sampling reality there is built in error connected to the sampling rate. The grid may not line up with points that exist in reality so a tiny dark spot may be on the boundary between two grid squared and neither of them pick it up faithfully. I am going to call this the Nyquist fudge factor (NFF). It is correctly related to the Nyquist Rate but the math is beyond this blog. Basically to avoid the anti aliasing errors you have to more or less double the sampling frequency (NFF of x2).

Armed with these two pieces of information, the 1 arc minute angle subtended and the NFF of x2, we can work out what resolution images we need to meet the requirements of the botanist in phases 3, 4 and 5.

At normal reading distance (as used in phase 4) visual acuity is around 0.1mm which implies dots need to be twice as frequent (NFF) to capture this level of detail – each dot or grid square should be 0.05mm across. Converting this to inches we get 500 dpi. This figure should guarantee to capture two line 100microns apart. We can double these measurements at the 700mm viewing distance suggested for phase 3 equating to 250 dpi.

What about at phase 5, where the botanist takes out a hand lens or binocular microscope? The most common magnification for a hand lens is 10x. No lens is perfect but this would imply resolutions in the region of 5000dpi. To reproduce the effect of a good quality binocular microscope is going to require capturing specimens at around the 10,000 dpi mark which is technically totally impractical and even if it was may not be desirable.

There is a big jump here. 250dpi -> 500dpi -> 10,000 dpi. Conventional photographic capture techniques can only hope to simulate a limited range of what a botanist does with a specimen. They can’t get anywhere near simulating the use of optics so we shouldn’t bother trying to do that.

Sanity Check

If you are reading this at a PC or Mac you probably can’t see the dots that make up the screen image. If you measure from your eye ball to the screen you will find it is in the region of 700mm away. In computer displays common dot pitches are 0.31mm to 0.25mm. My lovely iMac screen (no bias there) has a dot pitch of around 0.254. If I go down to my close focus distance of about 150mm I can just see the dots. The dot pitch of screens is a difficult measure because they have three sub-dots making up each colour dot. The dots may be arranged in different patterns but it is a worthy comparison for our purposes. Try this on your monitor with your own eyes. What dots can you see? The size of dots on your screen will be at least 250% bigger than the dots we are hypothesising capturing on specimens at 500dpi. i.e. imagine seeing something half to a third the size of the dot on your screen with your naked eye. Now imagine it is not brightly lit like a screen but part of a herbarium specimen.

You can cheat and use a hand lens to look at your screen if you like but remember you are then jumping to warp drive – and we are still only able to do impulse drive.

Looking at this sanely I can only conclude that 500dpi is the maximum needed to simulate phases 3 and 4 of a botanist’s work and that phase 5 simply can’t be done at the level of capturing the whole specimen. 300 dpi is probably plenty.

Conclusions

This is just my opinion expressed to stimulated thought rather than as the basis for some ultimate standard approach but I hope that it illustrates a danger. 600dpi was the old rule-of-thumb-resolution for producing images for printed materials. It maps well to the 300 lines per inch (lpi) used in standard quality half tone screens for printed works (think of the Nyquist fudge factor now applied to the conversion from dots in the computer back to lines on a page) but has been brought forward into the purely digital age where the images are unlikely to be printed. This single figure of 600dpi has affected the whole culture of digitising herbarium specimens in large herbaria.

I have only discussed a tiny aspect of the digitisation chain. I’ve not mentioned the importance of focus, camera shake, noise, colour or compression of files. The interesting stuff really starts with the electronic workflow that could be handling the images coming out of these workstations in an entirely automated way. But that is another story…

(Thanks to Bob Morris for some comments on this)

Biodiversity Informatics

Comments (9)

Kehan says:

2008/12/04 at 3:59 pm

I believe the 600dpi rule of thumb may have come out of the common herbarium practice of using cibachrome images of specimens should the original not be available (eg K borrowed types of Aus bus L. from E), and these have been accepted by taxonomists as reasonable surrogates that you can observe through a handlens/dissecting microscope and apparently (no source) cibachrome resolution is approximately 600 DPI which lead to the decision that this is the minimum required resolution. I do however agree with you that the efficiency of using scanners for this work is certainly questionable with the high speed sooper dooper digital slrs out today (and maybe the argument will be moot next year when DSLRs may well be able to take effectively >600dpi images).
Mary Barkworth says:

2009/01/11 at 3:02 pm

[There are no specimen images on the Website]
Thank you Roger for inserting some logic into the discussion of image resolution. My basic comment is that your eyesight is much better than mine. I routinely use 20x for examining specimens. If we consider using images for verifying an identification, it may be necessary to see what kinds of hairs are present on a surface. One can serve up lower resolution images for gaining an impression of what a taxon looks like, adding the ability to zoom in on any particular specimen. What I do not want to do is feel a need to reimage the collection (nor hear that my succdessor thinks it necessary). Having said that, your comments are really valuable. Thank you for sending me the link.
Kehan says:

2009/02/27 at 12:52 pm

Well this may be the day you were waiting for – a new device that automates the production of ‘very high res images’ using consumer cameras.
http://www.newscientist.com/article/dn16674-science-gets-a-boost-from-cheap-superresolution-snapshots.html?DCMP=OTC-rss&nsref=online-news
It’s even been used on herbarium specimens:
http://gigapan.org/viewGigapan.php?id=11397
– check the pollen on that Hibuscus out.
admin says:

2009/02/27 at 1:03 pm

Ha ha! I have just got one of these robot heads but not had much of a chance to play with it because the weather has been so bad etc etc. Here is an early attempt on a rainy day.

http://gigapan.org/viewGigapan.php?auth=e977f451bd58e7db5531ca3dbc14bbad

I had thought it might be fun to point it at a herbarium specimen and great to see some one has actually done it.

Seriously this is a lot like scanning. The camera will take up to 200 photos per image but it takes 2 to 5 seconds per photo. The herbarium shot probably took a minute to expose and a few minutes to stitch. Plus they have made adjustments in photoshop. A lot of fun though!
Kehan says:

2009/03/03 at 11:58 am

Looks pretty good – I like the landmarks feature as well. I hate to think how long the image stitching takes (or maybe it doesn’t stitch them it just turns them into tiles for the gigapan site?). But you’re right – loads of photos per specimen can certainly take a bit of time.
Steve says:

2011/01/02 at 7:53 pm

Just stumbled across this. I realise I´m a few years behind… where are we with this now? I was inspired by this recent article http://www.pnas.org/content/107/51/22169.abstract that suggests we have many species of plant waiting to be described from specimens already collected in herbarium cupboards. Scanning these and getting them online as quickly as possible would surely help this process (or is it just that we don´t have enough botanists…?!). There are projects such as GPI (http://gpi.myspecies.info/content/all-vascular-types-line-global-plants-initiative) focussing on scanning types where the high res may be justifiable, but would it be beneficial for a ´quick and dirty´ scan of all specimens? I´d really like to see the Kew herbarium (all 7 million+ specimens) scanned as soon as possible, but am not sure whether the focus should be on maintaining the high quality scans of GPI (would take donkeys years to scan all specimens at kew) or lower res, quick images is better – or a bit of both?! thanks.
Roger Hyam says:

2011/01/03 at 10:43 am

Thanks for your comments Steve.

I think it all comes down to use-case driven arguments (http://en.wikipedia.org/wiki/Use_case). Very few decisions are based on practical use-cases let alone testing of these use-cases prior to prioritising implementation of large projects.
Rafaa says:

2012/07/04 at 11:45 pm

I am working at the Tripoli University of Libya and looking for to have a website to our Herbarium, so I really need some advice about how to do th Digitisation herbarium specimes to get a good quality of the imges?
could you please send me any simple Technique to use?

kind regards
Rafaa
Rafaa says:

2012/11/15 at 3:55 am

HI there
I am a botanist and would like to take many photos of our herbarium specimens to upload them to our database, and i am a new for the digitalising the herbarium specimens, also I have a advanced camera (sony alpha 55) but i really don’t know how to use it for this purpose. could you please help me to use this camera and how to take a images from our Herbarium specimens.
kind regards
rafaa

This site uses Akismet to reduce spam. Learn how your comment data is processed.

What Botanists Actually Do

Visual Acuity and DPI

Sanity Check

Conclusions

Comments (9)

Leave a Reply