The paperless office with Linux
A few days ago I took delivery of a used Fujitsu ScanSnap S1500 (currently about 400€ new, I got mine for 235€ on eBay), and started on the long job of making my home office paperless.
xsane
The best news: it works out of the box with Linux (Ubuntu 11.10). Just install xsane as scanning software and you’re running. xsane is great for custom scanning (where you want some colour, some higher resolution (the scanner does up to 600dpi), lineart, duplex …).
scanbuttond
…but for your run-of-the-mill office scanning, you probably just want grey at 150dpi with fairly high compression (comes out at ~150kb per pdf page), and the ability to whack in stacks of paper and just keep on hitting the “Go” button. For this I installed scanbuttond and sane-utils on my home server (a little box under my table) and put together a little buttonpressed.sh script so every time the button is pressed, it creates a pdf in a network shared folder (which has the advantage that I can access my scanned documents from any computer in the house, and even scan without turning my desktop on!)
#!/bin/bash
OUT_DIR=/mnt/raid/scan TMP_DIR=`mktemp -d` cd $TMP_DIR
echo "################## Scanning ###################" scanimage \ --resolution 150 \ --batch=scan_%03d.tif --format=tiff \ --mode Gray \ --device-name "fujitsu:ScanSnap S1500:7739" \ -y 297 -x 210 \ --page-width 210 --page-height 297 \ --sleeptimer 1
echo "############## Converting to PDF ##############" #Use tiffcp to combine output tiffs to a single mult-page tiff #tiffcp -c lzw scan_*.tif output.tif tiffcp scan_*.tif output.tif #Convert the tiff to PDF tiff2pdf output.tif -j -q 60 -p A4 > $OUT_DIR/scan_`date +%Y%m%d-%H%M%S`.pdf cd .. echo "################ Cleaning Up ################" rm -rf $TMP_DIR
I took much of the inspiration for the script from this article, which also uses tesseract for OCR, but that just makes a separate text file with the recognised text… I don’t like that, so I’m still looking for a way to embed the detected text into the pdf…
As you can see I had to hard code the scanner name because scanbuttond (last updated in 2006…) passes the device address, but the current version of scanimage needs the device name as given by scanimage -L , so they’re not really compatible with each other any more… :-/
I’ve also set it such that all pdfs will be A4, and like I said earlier, only 150dpi, and pretty lossy jpeg compression – that’s my default preference, YMMV.
The S1500 in detail
Now a little about the scanner itself. It’s about the size and weight of a compact inkjet printer (or a cat).
The fold-in/out mechanism is pretty easy, so I think even though I’m ultimately lazy, I might even flip that shut when I’m done to dust protect it. Other than that there’s not much to say… it has one button: bright blue… I know some people may have a fit at that…). It comes with a 240VAC to 24VDC adapter, and a usb cable. The paper feed opens with a little button on the right, and the insides are readily understandable and cleanable. Did I mention it has two scanning heads, so it does duplex? 🙂
De-papering my office
My first task with the scanner was to scan in my business receipts from last year – that’s about 300 items, but many of the smaller receipts (bus tickets etc.) are pasted onto A4 sheets (many to a sheet). It took me about 30 minutes to scan the lot, including the time to remove any staples or clips (a must!), and a few paper jams. I have no idea how long that would have taken with my old flatbed…
The s1500 isn’t resistant against paper jams, but I was surprised to see it handled all the worst sheets (lots of different receipts pasted to one sheet) easily, and only had difficulty with the recycled paper we use where the individual sheets stick a bit more to one another because of the rougher surface. With a bit of practice fanning the sheets, this isn’t much of an issue either, but you do have to keep an eye on it as it’s scanning to be sure it got each individual sheet…
All in all, I’m very happy with the decision, and am looking forward to shredding and archiving lots of paper out of my office!
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.
July 3rd, 2012 at 03:32
Thanks for the helpful post. I just bought the same scanner and it works fine with Ubuntu 12.04 out of the box with Simple Scan and other sane-based front-ends. I modified the buttonpressed.sh example script in your post and that runs fine from the command line. I’m having trouble getting scanbuttond to sense any button events however. When I start the daemon, the syslog says: scanbuttond: no supported devices found. rescanning in a few seconds…
Clearly, the other sane fron-ends have no difficulty finding the scanner. Did you have to do something special in the initscanner.sh script to get scanbuttond working properly?
July 3rd, 2012 at 08:41
scanbuttond is runs as a particular user, and this user (or group) must have the correct permissions for the device… Probably there’s a difference between the permissions your login user, and the scanbuttond user. Use getfacl/setfacl to make sure the group/user that scanbuttond is running as has permissions (ACL) for the device. I had the same problem on one box where the install (of scanbuttond) failed first time round (I think facl may not be in the dependency list for the the package).
July 4th, 2012 at 11:45
Bookmarking!
I have a feeling this will be very helpful after I get my S1500. Thank you!
August 2nd, 2012 at 12:54
Thank you Robin. This script is very useful. It is very fast than its other GUI interfaced competitors for Linux.
September 10th, 2012 at 06:27
Thank you for this article, it has been very useful.However, I’m having a similar issue as to Scott in that everything is functional (buttonpressed.sh works fine with the s1500) accept for scanbuttond does not recognize a button press event. I have reset permissions both on the scanner and user with setfacl and still no result. The logs come back with
scanbuttond: rescanning devices
scanbuttond: no supported devices found. rescanning in a few seconds
Any suggestions that might lead me in the right direction?
Thank you in advance
September 26th, 2012 at 04:17
Thank you for a useful article.
It’s worth noting that support for the Fujitsu ScanSnap S1500 in scanbuttond is under continuing development, as described at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584
January 24th, 2013 at 01:27
[…] Clarke; The paperless office with Linux; In His Blog; […]
January 26th, 2013 at 11:50
Robin,
I’m curious, do you not see the “no known scanner found yet” message in syslog when scanbuttond starts?
I have created the udev rules to make sure that the user scanbuttond is run as does have access to the scanner (tested with scanimage), but the button still doesn’t work.
January 27th, 2013 at 12:22
Originally I didn’t have that error. Now I do… Sorry, but don’t know the cause, and no time to search. In the mean time I have changed my setup using a as a scanner/printer server with buttons to set target (printer/file on smb share), quality (300/600dpi), colour/greyscale etc. I’ll write a post about that asap.
January 27th, 2013 at 10:27
Interesting, but I guess it (scanbuttond detects the button press) still works for you with the s1500? I guess that’s the bottom line here (as long as it works, i don’t care about the error messages).
I’m working on writing some web scripts so I can use my phone/any computer with a browser, to connect and set parameters such as the ones you describe through a web GUI. I could just have that present a “scan” button, but it’d be nice to use the one on the device itself.
January 29th, 2013 at 06:10
Robin,
Any update on “I don’t like that, so I’m still looking for a way to embed the detected text into the pdf…”?
Really want to do this with my linux server but want to have embed search capability.
January 29th, 2013 at 06:40
Hi Graham,
Sorry… haven’t found anything. Haven’t looked either in the last months/year. If you find something, please let me know! 🙂
-Robin-
February 7th, 2013 at 05:33
For embedded search, I’ve been using pdfocr – https://launchpad.net/~gezakovacs/+archive/pdfocr. It’s been working well for me. Thanks for the info!
Andy
May 15th, 2013 at 08:56
I had the same problem with scanbuttond not recognizing my ScanSnap. I’m using Ubuntu.
The reason is, the “stable”-branch of scanbuttond just doesn’t have support for ScanSnap devices. However, in the Debian experimental package repository is a scanbuttond package with support built in. So if you’re using a Debian-style distro, here is a solution:
Read here on how to install experimental packages:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584
I had to add the corresponding PGP keys to my keyring also:
$ add-apt-key
Link to the Debian bug list leading me to this solution:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677584
Works like a charm now, thanks Robin for your script!
May 15th, 2013 at 08:58
Sorry, the first link regarding how to install experimental packages in general, is:
http://wiki.debian.org/DebianExperimental
May 15th, 2013 at 10:41
Another annotation I like to add is:
To scan the front and the back of your paper, add this option to the scanimage command:
–source “ADF Duplex”
Bastian
May 16th, 2013 at 08:54
Thanks for your help Bastian! I’ve actually moved away from scanbuttond now, and am using a Raspberry Pi as a dedicated printer/scanner server. Advantage is that I can use many more buttons/switches, and neater functions behind them than the one button on the scanner itself.
June 25th, 2013 at 08:55
Thanks for the info on scanbuttond, installing the experimental package on ubuntu 13.04 works.
what i do before converting to pdf is using unpaper on the scanned image.
change the scanimage format to –format=pnm
run unpaper like that:
unpaper –size a4 –overwrite scan_%03d.pnm unpapered_%03f.pnm
Convert the pnm files to tiff for further processing:
for i in `ls unpapered_*`; do pnmtotiff $i > $i.tiff; done
Same as above, but with a different filename:
tiffcp unpapered_*.tif output.tif
June 25th, 2013 at 08:56
sorry, c’n’p error, the correct line is:
unpaper –size a4 –overwrite scan_%03d.pnm unpapered_%03d.pnm
February 12th, 2016 at 04:54
[…] https://robinclarke.net/archives/the-paperless-office-with-linux […]