The paperless office with Linux

A few days ago I took delivery of a used Fujitsu ScanSnap S1500 (currently about 400€ new, I got mine for 235€ on eBay), and started on the long job of making my home office paperless.

xsane

The best news: it works out of the box with Linux (Ubuntu 11.10).  Just install xsane as scanning software and you’re running.  xsane is great for custom scanning (where you want some colour, some higher resolution (the scanner does up to 600dpi), lineart, duplex …).

scanbuttond

…but for your run-of-the-mill office scanning, you probably just want grey at 150dpi with fairly high compression (comes out at ~150kb per pdf page), and the ability to whack in stacks of paper and just keep on hitting the “Go” button.  For this I installed scanbuttond and sane-utils on my home server (a little box under my table) and put together a little buttonpressed.sh script so every time the button is pressed, it creates a pdf in a network shared folder (which has the advantage that I can access my scanned documents from any computer in the house, and even scan without turning my desktop on!)

#!/bin/bash
OUT_DIR=/mnt/raid/scan
TMP_DIR=`mktemp -d`
cd $TMP_DIR
echo "################## Scanning ###################"
scanimage \
 --resolution 150 \
 --batch=scan_%03d.tif --format=tiff \
 --mode Gray \
 --device-name "fujitsu:ScanSnap S1500:7739" \
 -y 297 -x 210 \
 --page-width 210 --page-height 297 \
 --sleeptimer 1
echo "############## Converting to PDF ##############"
#Use tiffcp to combine output tiffs to a single mult-page tiff
#tiffcp -c lzw scan_*.tif output.tif
tiffcp scan_*.tif output.tif
#Convert the tiff to PDF
tiff2pdf output.tif -j -q 60 -p A4 > $OUT_DIR/scan_`date +%Y%m%d-%H%M%S`.pdf
cd ..
echo "################ Cleaning Up ################"
rm -rf $TMP_DIR

I took much of the inspiration for the script from this article, which also uses tesseract for OCR, but that just makes a separate text file with the recognised text… I don’t like that, so I’m still looking for a way to embed the detected text into the pdf

As you can see I had to hard code the scanner name because scanbuttond (last updated in 2006…) passes the device address, but the current version of scanimage needs the device name as given by scanimage -L , so they’re not really compatible with each other any more… :-/

I’ve also set it such that all pdfs will be A4, and like I said earlier, only 150dpi, and pretty lossy jpeg compression – that’s my default preference, YMMV.

The S1500 in detail

Now a little about the scanner itself.  It’s about the size and weight of a compact inkjet printer (or a cat).

The fold-in/out mechanism is pretty easy, so I think even though I’m ultimately lazy, I might even flip that shut when I’m done to dust protect it.  Other than that there’s not much to say… it has one button: bright blue… I know some people may have a fit at that…).  It comes with a 240VAC to 24VDC adapter, and a usb cable.  The paper feed opens with a little button on the right, and the insides are readily understandable and cleanable.  Did I mention it has two scanning heads, so it does duplex? 🙂

De-papering my office

My first task with the scanner was to scan in my business receipts from last year – that’s about 300 items, but many of the smaller receipts (bus tickets etc.) are pasted onto A4 sheets (many to a sheet).  It took me about 30 minutes to scan the lot, including the time to remove any staples or clips (a must!), and a few paper jams.  I have no idea how long that would have taken with my old flatbed…

The s1500 isn’t resistant against paper jams, but I was surprised to see it handled all the worst sheets (lots of different receipts pasted to one sheet) easily, and only had difficulty with the recycled paper we use where the individual sheets stick a bit more to one another because of the rougher surface.  With a bit of practice fanning the sheets, this isn’t much of an issue either, but you do have to keep an eye on it as it’s scanning to be sure it got each individual sheet

All in all, I’m very happy with the decision, and am looking forward to shredding and archiving lots of paper out of my office!

 


You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

20 Responses to “The paperless office with Linux”

  1. Scott Singer Says:

    Thanks for the helpful post. I just bought the same scanner and it works fine with Ubuntu 12.04 out of the box with Simple Scan and other sane-based front-ends. I modified the buttonpressed.sh example script in your post and that runs fine from the command line. I’m having trouble getting scanbuttond to sense any button events however. When I start the daemon, the syslog says: scanbuttond: no supported devices found. rescanning in a few seconds…

    Clearly, the other sane fron-ends have no difficulty finding the scanner. Did you have to do something special in the initscanner.sh script to get scanbuttond working properly?

  2. scanbuttond is runs as a particular user, and this user (or group) must have the correct permissions for the device… Probably there’s a difference between the permissions your login user, and the scanbuttond user. Use getfacl/setfacl to make sure the group/user that scanbuttond is running as has permissions (ACL) for the device. I had the same problem on one box where the install (of scanbuttond) failed first time round (I think facl may not be in the dependency list for the the package).

  3. Future scanner Says:

    Bookmarking!

    I have a feeling this will be very helpful after I get my S1500. Thank you!

  4. Umakanth Akkineni Says:

    Thank you Robin. This script is very useful. It is very fast than its other GUI interfaced competitors for Linux.

  5. Thank you for this article, it has been very useful.However, I’m having a similar issue as to Scott in that everything is functional (buttonpressed.sh works fine with the s1500) accept for scanbuttond does not recognize a button press event. I have reset permissions both on the scanner and user with setfacl and still no result. The logs come back with
    scanbuttond: rescanning devices
    scanbuttond: no supported devices found. rescanning in a few seconds
    Any suggestions that might lead me in the right direction?
    Thank you in advance

  6. Thank you for a useful article.

    It’s worth noting that support for the Fujitsu ScanSnap S1500 in scanbuttond is under continuing development, as described at

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584

  7. […] Clarke; The paperless office with Linux; In His Blog; […]

  8. Robin,

    I’m curious, do you not see the “no known scanner found yet” message in syslog when scanbuttond starts?

    I have created the udev rules to make sure that the user scanbuttond is run as does have access to the scanner (tested with scanimage), but the button still doesn’t work.

  9. Originally I didn’t have that error. Now I do… Sorry, but don’t know the cause, and no time to search. In the mean time I have changed my setup using a as a scanner/printer server with buttons to set target (printer/file on smb share), quality (300/600dpi), colour/greyscale etc. I’ll write a post about that asap.

  10. Interesting, but I guess it (scanbuttond detects the button press) still works for you with the s1500? I guess that’s the bottom line here (as long as it works, i don’t care about the error messages).

    I’m working on writing some web scripts so I can use my phone/any computer with a browser, to connect and set parameters such as the ones you describe through a web GUI. I could just have that present a “scan” button, but it’d be nice to use the one on the device itself.

  11. Robin,

    Any update on “I don’t like that, so I’m still looking for a way to embed the detected text into the pdf…”?

    Really want to do this with my linux server but want to have embed search capability.

  12. Hi Graham,

    Sorry… haven’t found anything. Haven’t looked either in the last months/year. If you find something, please let me know! 🙂

    -Robin-

  13. For embedded search, I’ve been using pdfocr – https://launchpad.net/~gezakovacs/+archive/pdfocr. It’s been working well for me. Thanks for the info!

    Andy

  14. I had the same problem with scanbuttond not recognizing my ScanSnap. I’m using Ubuntu.

    The reason is, the “stable”-branch of scanbuttond just doesn’t have support for ScanSnap devices. However, in the Debian experimental package repository is a scanbuttond package with support built in. So if you’re using a Debian-style distro, here is a solution:

    Read here on how to install experimental packages:
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584

    I had to add the corresponding PGP keys to my keyring also:
    $ add-apt-key

    Link to the Debian bug list leading me to this solution:
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584

    Works like a charm now, thanks Robin for your script!

  15. Sorry, the first link regarding how to install experimental packages in general, is:
    http://wiki.debian.org/DebianExperimental

  16. Another annotation I like to add is:

    To scan the front and the back of your paper, add this option to the scanimage command:
    –source “ADF Duplex”

    Bastian

  17. Thanks for your help Bastian! I’ve actually moved away from scanbuttond now, and am using a Raspberry Pi as a dedicated printer/scanner server. Advantage is that I can use many more buttons/switches, and neater functions behind them than the one button on the scanner itself.

  18. Thanks for the info on scanbuttond, installing the experimental package on ubuntu 13.04 works.

    what i do before converting to pdf is using unpaper on the scanned image.

    change the scanimage format to –format=pnm
    run unpaper like that:
    unpaper –size a4 –overwrite scan_%03d.pnm unpapered_%03f.pnm

    Convert the pnm files to tiff for further processing:
    for i in `ls unpapered_*`; do pnmtotiff $i > $i.tiff; done

    Same as above, but with a different filename:
    tiffcp unpapered_*.tif output.tif

  19. sorry, c’n’p error, the correct line is:

    unpaper –size a4 –overwrite scan_%03d.pnm unpapered_%03d.pnm