Discussion:
[Simh] best way to scan 172 column fanfold 80s printout?
Dan Gahlinger
2018-02-11 14:55:11 UTC
Permalink
I have several printouts like this,
the one I was just trying to scan in is labelled "EMPIRE Version 4.0 18-Jan-81"
with the notice: "Please send bug reports to ELROND::EMPIRE"
This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources from around fall 1979
It seems I only have the first 95 pages of this printout
and it's folded width-wise, making scanning more difficult, old folds are hard to get out.

I also have Zork (Vax/VMS) and of course several different iterations of Trek7 (Vms)
somewhere I have a copy of Adventure (Colossal Cave) and the "Castle" game I love so much.

so I guess question 1: how best to get rid of the folds? my method so far: fold them the other direction and try and fold it out, but so far not much luck
and 2: how best to scan 100s of wide fanfold printout pages?

I wish someone in Toronto had converted an old teletype and put a camera on it, that would be brilliant!

Dan.
Zane Healy
2018-02-11 16:09:17 UTC
Permalink
> On Feb 11, 2018, at 6:55 AM, Dan Gahlinger <***@hotmail.com> wrote:
>
> I have several printouts like this,
> the one I was just trying to scan in is labelled "EMPIRE Version 4.0 18-Jan-81"
> with the notice: "Please send bug reports to ELROND::EMPIRE"
> This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources from around fall 1979
> It seems I only have the first 95 pages of this printout
> and it's folded width-wise, making scanning more difficult, old folds are hard to get out.
>
> I also have Zork (Vax/VMS) and of course several different iterations of Trek7 (Vms)
> somewhere I have a copy of Adventure (Colossal Cave) and the "Castle" game I love so much.
>
> so I guess question 1: how best to get rid of the folds? my method so far: fold them the other direction and try and fold it out, but so far not much luck
> and 2: how best to scan 100s of wide fanfold printout pages?
>
> I wish someone in Toronto had converted an old teletype and put a camera on it, that would be brilliant!
>
> Dan.

The best way might be a piece of glass (to keep the paper flat), a copy stand, and a high-MP DSLR. Lighting in that situation would be
 interesting. I’m not sure how much a polarizer on the lens would help. One option might be to put it on a light table, but I think that would create an interesting/unreadable mess. Actually less light might be better, and simply go with longer exposures.

There are graphic arts scanners that will do large pages, but in the art reproduction world, the method above (normally minus the glass), is more normal. You’re lucky, you’re looking to copy something that doesn’t need to be 1200dpi or better. I know you can get up to at least 12x18 range with a scanner. I’m currently looking for either one of these, or ideally a drum scanner capable of handling 11x14 negatives. Right now the only way I have to get a digital copy of photo’s taken with my 11x14 camera, is to photograph the prints.

Zane
Pär Moberg
2018-02-11 16:18:21 UTC
Permalink
Look at the diy book scanning community for inspiration and make sure that
the light comes at an angle that doesn't reflect in to the camera.
I just found a led light fixture that pumps out as lot of light and is long
as a "tube light" 1,2m (approximately 1,5 yards)
//PÀr

Den 11 feb. 2018 5:09 PM skrev "Zane Healy" <***@avanthar.com>:

> On Feb 11, 2018, at 6:55 AM, Dan Gahlinger <***@hotmail.com> wrote:
>
> I have several printouts like this,
> the one I was just trying to scan in is labelled "EMPIRE Version 4.0
> 18-Jan-81"
> with the notice: "Please send bug reports to ELROND::EMPIRE"
> This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources from
> around fall 1979
> It seems I only have the first 95 pages of this printout
> and it's folded width-wise, making scanning more difficult, old folds are
> hard to get out.
>
> I also have Zork (Vax/VMS) and of course several different iterations of
> Trek7 (Vms)
> somewhere I have a copy of Adventure (Colossal Cave) and the "Castle" game
> I love so much.
>
> so I guess question 1: how best to get rid of the folds? my method so far:
> fold them the other direction and try and fold it out, but so far not much
> luck
> and 2: how best to scan 100s of wide fanfold printout pages?
>
> I wish someone in Toronto had converted an old teletype and put a camera
> on it, that would be brilliant!
>
> Dan.
>
>
> The best way might be a piece of glass (to keep the paper flat), a copy
> stand, and a high-MP DSLR. Lighting in that situation would be

> interesting. I’m not sure how much a polarizer on the lens would help.
> One option might be to put it on a light table, but I think that would
> create an interesting/unreadable mess. Actually less light might be
> better, and simply go with longer exposures.
>
> There are graphic arts scanners that will do large pages, but in the art
> reproduction world, the method above (normally minus the glass), is more
> normal. You’re lucky, you’re looking to copy something that doesn’t need
> to be 1200dpi or better. I know you can get up to at least 12x18 range
> with a scanner. I’m currently looking for either one of these, or ideally
> a drum scanner capable of handling 11x14 negatives. Right now the only way
> I have to get a digital copy of photo’s taken with my 11x14 camera, is to
> photograph the prints.
>
> Zane
>
>
>
>
> _______________________________________________
> Simh mailing list
> ***@trailing-edge.com
> http://mailman.trailing-edge.com/mailman/listinfo/simh
>
Timothe Litt
2018-02-11 17:45:37 UTC
Permalink
These opportunities keep coming up; lots of us archived paper, which
survives longer than magnetics - and their transports.

These seem to be addressed as one-off projects.  It would be more
efficient if a group of interested people could develop/find a sponsor
for a listing -> code facility.  But that may be just a dream.

Scanning paper efficiently requires an investment.  This would seem to
be something that could best be centralized (or regionalized).  Al
Kossow (chm/bitsavers) has hardware for efficiently scanning manuals,
but I don't know if it handles 11 x 17 (line printer) pages.  But he's
not centrally located - and the really scarce resource is labor.

Scanning code is a bit different from scanning books.  Listings tend to
have headers, footers, (tractor feed holes), notations - in some cases,
assembly code or other columns - separate from the code.  Plus lines
and/or colored bars.  And while the font will be consistent & monospace,
ribbons don't always produce crisp impressions.  They fade; the paper
isn't acid-free; zero and O aren't interchangeable, and spaces matter. 
You want to end up with code that can be compiled - with minimal manual
intervention.  So you will want to be able OCR the result, without a lot
of fixups.  And you need to be able to either select the desired source
code, or reliably post-process to extract it.  So getting every
character (including spaces) right matters, and skew that might be
tolerated in a book becomes a problem with listings.  On the other hand,
if the tractor feed holes haven't been detached, it ought to be possible
to adapt a printer as a pretty good transport.  Printers like the DEC
LA120 have the necessary stepping motors, optical encoders, power, and
are microprocessor controlled.  Some line printers have 4 tractor
drives, which can hold paper flatter than the 2 of serial printers.  But
these are more power hungry and a bit more work to adapt.

In any case, the problem is that building something efficient is a
Project; it would be really useful in the grand scheme of things.  But
for any one recovery, it always seems better to just stumble along with
something ad-hoc. So local optimization, as often is the case, wins over
global.

Here are some starting points from the book world:

https://www.diybookscanner.org
https://arstechnica.com/gadgets/2013/02/diy-book-scanning-is-easier-than-you-think/
https://www.wired.com/2010/04/the-20-diy-book-scanner/
https://makezine.com/projects/make-41-tinkering-toys/diy-book-scanner/
http://scantailor.org
https://www.theverge.com/2012/11/13/3639016/google-books-scanner-vacuum-diy

P.S. I once worked in a very small company; cash was short.  We used
each sheet of printer paper four times, and never burst it.  Front and
back, of course.  But it turns out that most of our listings were
left-skewed.  So turning the paper up-side down and printing the right
side was adequate for working listings.  There was minimal overlap.  I
wouldn't want to scan those :-)

On 11-Feb-18 11:18, PÀr Moberg wrote:
> Look at the diy book scanning community for inspiration and make sure
> that the light comes at an angle that doesn't reflect in to the camera.
> I just found a led light fixture that pumps out as lot of light and is
> long as a "tube light" 1,2m (approximately 1,5 yards)
> //PÀr 
>
> Den 11 feb. 2018 5:09 PM skrev "Zane Healy" <***@avanthar.com
> <mailto:***@avanthar.com>>:
>
>> On Feb 11, 2018, at 6:55 AM, Dan Gahlinger <***@hotmail.com
>> <mailto:***@hotmail.com>> wrote:
>>
>> I have several printouts like this,
>> the one I was just trying to scan in is labelled "EMPIRE Version
>> 4.0 18-Jan-81"
>> with the notice: "Please send bug reports to ELROND::EMPIRE"
>> This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources
>> from around fall 1979
>> It seems I only have the first 95 pages of this printout
>> and it's folded width-wise, making scanning more difficult, old
>> folds are hard to get out.
>>
>> I also have Zork (Vax/VMS) and of course several different
>> iterations of Trek7 (Vms)
>> somewhere I have a copy of Adventure (Colossal Cave) and the
>> "Castle" game I love so much.
>>
>> so I guess question 1: how best to get rid of the folds? my
>> method so far: fold them the other direction and try and fold it
>> out, but so far not much luck
>> and 2: how best to scan 100s of wide fanfold printout pages?
>>
>> I wish someone in Toronto had converted an old teletype and put a
>> camera on it, that would be brilliant!
>>
>> Dan.
>
> The best way might be a piece of glass (to keep the paper flat), a
> copy stand, and a high-MP DSLR.  Lighting in that situation would
> be
 interesting.  I’m not sure how much a polarizer on the lens
> would help.  One option might be to put it on a light table, but I
> think that would create an interesting/unreadable mess.  Actually
> less light might be better, and simply go with longer exposures.
>
> There are graphic arts scanners that will do large pages, but in
> the art reproduction world, the method above (normally minus the
> glass), is more normal.  You’re lucky, you’re looking to copy
> something that doesn’t need to be 1200dpi or better.  I know you
> can get up to at least 12x18 range with a scanner.  I’m currently
> looking for either one of these, or ideally a drum scanner capable
> of handling 11x14 negatives.  Right now the only way I have to get
> a digital copy of photo’s taken with my 11x14 camera, is to
> photograph the prints.
>
> Zane
>
>
>
>
> _______________________________________________
> Simh mailing list
> ***@trailing-edge.com <mailto:***@trailing-edge.com>
> http://mailman.trailing-edge.com/mailman/listinfo/simh
> <http://mailman.trailing-edge.com/mailman/listinfo/simh>
>
>
>
> _______________________________________________
> Simh mailing list
> ***@trailing-edge.com
> http://mailman.trailing-edge.com/mailman/listinfo/simh
Al Kossow
2018-02-11 18:12:11 UTC
Permalink
On 2/11/18 9:45 AM, Timothe Litt wrote:

> Scanning paper efficiently requires an investment.  This would seem to be something that could best be centralized (or
> regionalized).  Al Kossow (chm/bitsavers) has hardware for efficiently scanning manuals, but I don't know if it handles
> 11 x 17 (line printer) pages.  But he's not centrally located - and the really scarce resource is labor.


I have a Panasonic KV-S3065C scanner with can scan double-sided anything 11.2" wide up to about 100" long

That is how I deal with most engineering drawings and listings.
I also have a blueprint sized scanner and a microfiche scanner, both currently off-line.

The main problem with listings is if the print is misaligned and cross the perf,
since they have to be burst to be scanned.
Dan Gahlinger
2018-02-11 18:11:16 UTC
Permalink
Yes it comes up time and again, and book scanning projects only work so well.

which is why I wondered what people thought of turning an old DEC teletype or printer into a scanner, by fixing a camera to it,
you could even start by turning pages manually.
at least the tractor feed would hold the pages evenly.
you could even just use a hand held camera with the right lighting.

light tables and other things too, or just put it on a table and photo each page.
no matter how you slice it, tedious.

zork vms is 6 inches thick, it's no wonder i've never tried it...

Dan.
________________________________
From: Simh <simh-***@trailing-edge.com> on behalf of Timothe Litt <***@ieee.org>
Sent: February 11, 2018 12:45 PM
To: ***@trailing-edge.com
Subject: Re: [Simh] best way to scan 172 column fanfold 80s printout?

These opportunities keep coming up; lots of us archived paper, which survives longer than magnetics - and their transports.

These seem to be addressed as one-off projects. It would be more efficient if a group of interested people could develop/find a sponsor for a listing -> code facility. But that may be just a dream.

Scanning paper efficiently requires an investment. This would seem to be something that could best be centralized (or regionalized). Al Kossow (chm/bitsavers) has hardware for efficiently scanning manuals, but I don't know if it handles 11 x 17 (line printer) pages. But he's not centrally located - and the really scarce resource is labor.

Scanning code is a bit different from scanning books. Listings tend to have headers, footers, (tractor feed holes), notations - in some cases, assembly code or other columns - separate from the code. Plus lines and/or colored bars. And while the font will be consistent & monospace, ribbons don't always produce crisp impressions. They fade; the paper isn't acid-free; zero and O aren't interchangeable, and spaces matter. You want to end up with code that can be compiled - with minimal manual intervention. So you will want to be able OCR the result, without a lot of fixups. And you need to be able to either select the desired source code, or reliably post-process to extract it. So getting every character (including spaces) right matters, and skew that might be tolerated in a book becomes a problem with listings. On the other hand, if the tractor feed holes haven't been detached, it ought to be possible to adapt a printer as a pretty good transport. Printers like the DEC LA120 have the necessary stepping motors, optical encoders, power, and are microprocessor controlled. Some line printers have 4 tractor drives, which can hold paper flatter than the 2 of serial printers. But these are more power hungry and a bit more work to adapt.

In any case, the problem is that building something efficient is a Project; it would be really useful in the grand scheme of things. But for any one recovery, it always seems better to just stumble along with something ad-hoc. So local optimization, as often is the case, wins over global.

Here are some starting points from the book world:

https://www.diybookscanner.org
https://arstechnica.com/gadgets/2013/02/diy-book-scanning-is-easier-than-you-think/
https://www.wired.com/2010/04/the-20-diy-book-scanner/
https://makezine.com/projects/make-41-tinkering-toys/diy-book-scanner/
http://scantailor.org
https://www.theverge.com/2012/11/13/3639016/google-books-scanner-vacuum-diy

P.S. I once worked in a very small company; cash was short. We used each sheet of printer paper four times, and never burst it. Front and back, of course. But it turns out that most of our listings were left-skewed. So turning the paper up-side down and printing the right side was adequate for working listings. There was minimal overlap. I wouldn't want to scan those :-)

On 11-Feb-18 11:18, Pär Moberg wrote:
Look at the diy book scanning community for inspiration and make sure that the light comes at an angle that doesn't reflect in to the camera.
I just found a led light fixture that pumps out as lot of light and is long as a "tube light" 1,2m (approximately 1,5 yards)
//Pär

Den 11 feb. 2018 5:09 PM skrev "Zane Healy" <***@avanthar.com<mailto:***@avanthar.com>>:
On Feb 11, 2018, at 6:55 AM, Dan Gahlinger <***@hotmail.com<mailto:***@hotmail.com>> wrote:

I have several printouts like this,
the one I was just trying to scan in is labelled "EMPIRE Version 4.0 18-Jan-81"
with the notice: "Please send bug reports to ELROND::EMPIRE"
This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources from around fall 1979
It seems I only have the first 95 pages of this printout
and it's folded width-wise, making scanning more difficult, old folds are hard to get out.

I also have Zork (Vax/VMS) and of course several different iterations of Trek7 (Vms)
somewhere I have a copy of Adventure (Colossal Cave) and the "Castle" game I love so much.

so I guess question 1: how best to get rid of the folds? my method so far: fold them the other direction and try and fold it out, but so far not much luck
and 2: how best to scan 100s of wide fanfold printout pages?

I wish someone in Toronto had converted an old teletype and put a camera on it, that would be brilliant!

Dan.

The best way might be a piece of glass (to keep the paper flat), a copy stand, and a high-MP DSLR. Lighting in that situation would be… interesting. I’m not sure how much a polarizer on the lens would help. One option might be to put it on a light table, but I think that would create an interesting/unreadable mess. Actually less light might be better, and simply go with longer exposures.

There are graphic arts scanners that will do large pages, but in the art reproduction world, the method above (normally minus the glass), is more normal. You’re lucky, you’re looking to copy something that doesn’t need to be 1200dpi or better. I know you can get up to at least 12x18 range with a scanner. I’m currently looking for either one of these, or ideally a drum scanner capable of handling 11x14 negatives. Right now the only way I have to get a digital copy of photo’s taken with my 11x14 camera, is to photograph the prints.

Zane
Al Kossow
2018-02-11 18:17:08 UTC
Permalink
On 2/11/18 10:11 AM, Dan Gahlinger wrote:

> which is why I wondered what people thought of turning an old DEC teletype or printer into a scanner, by fixing a camera
> to it

sounds like a bigger version of the Thunderscan
https://www.folklore.org/StoryView.py?story=Thunderscan.txt
Davis Johnson
2018-02-11 19:29:12 UTC
Permalink
I think what you need is a wide carriage printer with the typical feed
up through a slot in the bottom, and a camera.

The only working function needed from the printer is form feed.
Photograph the page that is hanging below the printer, form feed and repeat.

Anybody here ought to be able to handle the programming to automate this
process.

You would need to manually photograph the first page.

The camera would need good depth of field.


On 02/11/2018 01:17 PM, Al Kossow wrote:
>
> On 2/11/18 10:11 AM, Dan Gahlinger wrote:
>
>> which is why I wondered what people thought of turning an old DEC teletype or printer into a scanner, by fixing a camera
>> to it
> sounds like a bigger version of the Thunderscan
> https://www.folklore.org/StoryView.py?story=Thunderscan.txt
>
>
>
> _______________________________________________
> Simh mailing list
> ***@trailing-edge.com
> http://mailman.trailing-edge.com/mailman/listinfo/simh
>
>
Timothe Litt
2018-02-11 20:10:36 UTC
Permalink
On 11-Feb-18 14:29, Davis Johnson wrote:
> I think what you need is a wide carriage printer with the typical feed
> up through a slot in the bottom, and a camera.
>
> The only working function needed from the printer is form feed.
> Photograph the page that is hanging below the printer, form feed and
> repeat.
>
> Anybody here ought to be able to handle the programming to automate
> this process.
>
> You would need to manually photograph the first page.
>
> The camera would need good depth of field.
>
>
It's not that simple.  You need to deal with at least 2 common vertical
pitches (6 & 8 LPI), and a number of page lengths (and widths).  These
need to be setup per job; not all printers support all these.  Plus,
misalignment (as Al noted, crossing the perforations at the bottom of a
page is quite common).  The OP mentioned that his listings have a hard
crease; this will cause (at least) feed and stacking problems.  Form
feed causes a high-speed slew; this becomes less reliable as the
distance moved increases.  You're proposing an entire page at a time -
which means that the paper will jump off the tractors frequently.[1] 
Old paper is fragile.  Over hundreds of pages, dimensions may not be
stable; it was not uncommon to have to re-adjust TOF after a while. 
There's a fair bit of error detection and recovery to work out.

Lighting is an issue, as is compensating for keystoning and other
misalignments.  Most cameras don't have a standard remote trigger
interface - one of the pointers I provided loads modified firmware into
cameras from one manufacturer to make this work.  If you look at digital
camera reviews, you'll see that the lenses have varying degrees of
artifacts, especially at the edges.  So you need to find and zoom to an
area that's relatively "flat" & doesn't need a lot of correction.  While
depth of field will help, it also will result in apparent font size
changes as paper sways forward and back.  If you stop that, you simplify
the OCR - and don't need as much depth of field.

There are many backgrounds that need to be subtracted for OCR to work. 
(Printer paper was notorious for institutional logos, as well as bars
and other aids to human readers.)  Then there are the other issues
mentioned in my earlier note.

It seems simple, but it is a P.roject.  That's a capital P.  With a lot
of roject to work out.

It's worthwhile, but it's not simple.  It's a pretty interesting
hardware (and software) project.  I don't mean to discourage anyone who
wants to work on it - but you need to go in with eyes open, or you'll
end up very, very frustrated.

Thunderscan tried to scan line by line & retrieve grayscale; the
challenges were piecing together the adjacent lines with pixel
resolution.   The focal distance was constant because the camera was on
a carriage.  The idea here is to capture a page per frame.  So the
registration problems are quite different.  One could try the
thunderscan approach; it would trade one set of problems xxx "challenges
and opportunities" for another.

[1] In my experience, with many brands and models of tractor feed
printers over many years.  Paper handling is really difficult to get right.

> On 02/11/2018 01:17 PM, Al Kossow wrote:
>>
>> On 2/11/18 10:11 AM, Dan Gahlinger wrote:
>>
>>> which is why I wondered what people thought of turning an old DEC
>>> teletype or printer into a scanner, by fixing a camera
>>> to it
>> sounds like a bigger version of the Thunderscan
>> https://www.folklore.org/StoryView.py?story=Thunderscan.txt
>>
Davis Johnson
2018-02-11 23:32:43 UTC
Permalink
Almost every wide printer had adjustable widths. If this is a problem
try a different printer. Non-standard lengths may be more of an issue.

If you shoot an entire page at a time line spacing is a problem for your
OCR software.

With tractor feed page length variability should not be an issue - every
page will have the same number of holes. I almost always ran whole boxes
of paper without adjusting top of form. On the big printers with dual
tractors I could start a new box without adjusting top of form. Some
printers may not repeatably feed exactly an integer number of holes each
time -- I didn't experience this.

Friction feed is another mater, and accumulated form feed errors will
present exactly the problems you describe. Many interesting listings
probably don't have tractor holes.

Problems feeding and stacking will prevent this from being an unattended
operation. Perfs will be weak - both the perfs between the page and the
tractor holes and the perfs between pages. A Data Products B1200 would
sometimes break perfs on new paper. Some consumer model dot-matrix
printers had gentle form feeds. Having a manual feed mode, requiring a
button push for each page, would probably be a good idea.

Upfront camera setup care would be required. Proper camera selection and
position should minimize keystone, pincushion, barrel etc. distortion.
Even lighting would be required - much as would be used with a copy
stand. I am somewhat more worried that the paper would be hanging
loosely from the printer, not sandwiched between glass. The camera will
need a small enough aperture to keep the entire sheet in focus while the
paper does whatever the heck it wants to. It gets into the basic
principle that you have to get the analog part right if you want to
successfully digitize.

I probably did understate the effort. There would be work to do, but no
part of it seems to be intractable to me. If worse came to worse and you
had to ditch the printer's control electronics and drive the feed
steppers directly you would still be ahead not having to build a paper
transport from scratch.

I remember wanting one of those thunder scan devices. In this case I
think that approach would cause more problems than it would solve.

OCRing the result, however you collect the images, is likely the hardest
part anyway.



On 02/11/2018 03:10 PM, Timothe Litt wrote:
> It's not that simple.  You need to deal with at least 2 common
> vertical pitches (6 & 8 LPI), and a number of page lengths (and
> widths).  These need to be setup per job; not all printers support all
> these.  Plus, misalignment (as Al noted, crossing the perforations at
> the bottom of a page is quite common).  The OP mentioned that his
> listings have a hard crease; this will cause (at least) feed and
> stacking problems.  Form feed causes a high-speed slew; this becomes
> less reliable as the distance moved increases.  You're proposing an
> entire page at a time - which means that the paper will jump off the
> tractors frequently.[1] Old paper is fragile.  Over hundreds of pages,
> dimensions may not be stable; it was not uncommon to have to re-adjust
> TOF after a while.  There's a fair bit of error detection and recovery
> to work out.
>
> Lighting is an issue, as is compensating for keystoning and other
> misalignments.  Most cameras don't have a standard remote trigger
> interface - one of the pointers I provided loads modified firmware
> into cameras from one manufacturer to make this work.  If you look at
> digital camera reviews, you'll see that the lenses have varying
> degrees of artifacts, especially at the edges.  So you need to find
> and zoom to an area that's relatively "flat" & doesn't need a lot of
> correction.  While depth of field will help, it also will result in
> apparent font size changes as paper sways forward and back.  If you
> stop that, you simplify the OCR - and don't need as much depth of field.
>
> There are many backgrounds that need to be subtracted for OCR to
> work.  (Printer paper was notorious for institutional logos, as well
> as bars and other aids to human readers.)  Then there are the other
> issues mentioned in my earlier note.
>
> It seems simple, but it is a P.roject.  That's a capital P. With a lot
> of roject to work out.
>
> It's worthwhile, but it's not simple.  It's a pretty interesting
> hardware (and software) project.  I don't mean to discourage anyone
> who wants to work on it - but you need to go in with eyes open, or
> you'll end up very, very frustrated.
>
> Thunderscan tried to scan line by line & retrieve grayscale; the
> challenges were piecing together the adjacent lines with pixel
> resolution.   The focal distance was constant because the camera was
> on a carriage.  The idea here is to capture a page per frame.  So the
> registration problems are quite different.  One could try the
> thunderscan approach; it would trade one set of problems xxx
> "challenges and opportunities" for another.
>
Alan Frisbie
2018-02-12 00:45:54 UTC
Permalink
On 02/11/2018 03:32 PM, Davis Johnson wrote:

> OCRing the result, however you collect the images, is likely the
> hardest part anyway.

Totally true. Many printers produced listings that were sometimes
difficult for humans to read!

Alan Frisbie
Carey Tyler Schug
2018-02-12 12:50:38 UTC
Permalink
Here in the Chicago suburb of Des Plaines the public library has an
11x17 scanner available for free (flatbed).  Scan your pages, then email
them to yourself.  Don't even need a library card.  It is a prepackaged
service they buy so I am sure many other libraries have it too.
Zane Healy
2018-02-12 02:10:34 UTC
Permalink
> On Feb 11, 2018, at 12:10 PM, Timothe Litt <***@ieee.org> wrote:
>
> Lighting is an issue, as is compensating for keystoning and other misalignments. Most cameras don't have a standard remote trigger interface - one of the pointers I provided loads modified firmware into cameras from one manufacturer to make this work. If you look at digital camera reviews, you'll see that the lenses have varying degrees of artifacts, especially at the edges. So you need to find and zoom to an area that's relatively "flat" & doesn't need a lot of correction. While depth of field will help, it also will result in apparent font size changes as paper sways forward and back. If you stop that, you simplify the OCR - and don't need as much depth of field.

Even with a copystand getting everything lined up correctly is a pain. I have a little trick I use when using a DSLR with a Macro lens on one of my copystands. I have a pile of heavy/flat books to get enough height that I can rest the rim of the lens on the books. I use this to get it parallel to the surface of the stand, then tighten the camera down.

I’ve never tried doing this with anything that needs OCR’d. I’ve mainly used the setup for glass slides, but the next up for me will be some 200 year old art, that I’ve been asked to reproduce.

Zane
Carey Tyler Schug
2018-02-12 13:07:43 UTC
Permalink
Hanging below may be best idea on this sub-thread.  Hanging below one
could BACK LIGHT the paper, which would make even lighting much more
easy.  A milky white piece of glass with several lights behind it, where
if they were in front they would block the camera.  It would require
compensation for the green bar fanfold, but maybe greenish lights would
help that?


On 02/11/2018 02:10 PM, Timothe Litt wrote:
>
> On 11-Feb-18 14:29, Davis Johnson wrote:
>> I think what you need is a wide carriage printer with the typical
>> feed up through a slot in the bottom, and a camera.
>>
>> The only working function needed from the printer is form feed.
>> Photograph the page that is hanging below the printer, form feed and
>> repeat.
>>
>> Anybody here ought to be able to handle the programming to automate
>> this process.
>>
>> You would need to manually photograph the first page.
>>
>> The camera would need good depth of field.
>>
>>
> It's not that simple.  You need to deal with at least 2 common
> vertical pitches (6 & 8 LPI), and a number of page lengths (and
> widths).  These need to be setup per job; not all printers support all
> these.  Plus, misalignment (as Al noted, crossing the perforations at
> the bottom of a page is quite common).  The OP mentioned that his
> listings have a hard crease; this will cause (at least) feed and
> stacking problems.  Form feed causes a high-speed slew; this becomes
> less reliable as the distance moved increases.  You're proposing an
> entire page at a time - which means that the paper will jump off the
> tractors frequently.[1] Old paper is fragile.  Over hundreds of pages,
> dimensions may not be stable; it was not uncommon to have to re-adjust
> TOF after a while.  There's a fair bit of error detection and recovery
> to work out.
>
> Lighting is an issue, as is compensating for keystoning and other
> misalignments.  Most cameras don't have a standard remote trigger
> interface - one of the pointers I provided loads modified firmware
> into cameras from one manufacturer to make this work.  If you look at
> digital camera reviews, you'll see that the lenses have varying
> degrees of artifacts, especially at the edges.  So you need to find
> and zoom to an area that's relatively "flat" & doesn't need a lot of
> correction.  While depth of field will help, it also will result in
> apparent font size changes as paper sways forward and back.  If you
> stop that, you simplify the OCR - and don't need as much depth of field.
>
> There are many backgrounds that need to be subtracted for OCR to
> work.  (Printer paper was notorious for institutional logos, as well
> as bars and other aids to human readers.)  Then there are the other
> issues mentioned in my earlier note.
>
> It seems simple, but it is a P.roject.  That's a capital P. With a lot
> of roject to work out.
>
> It's worthwhile, but it's not simple.  It's a pretty interesting
> hardware (and software) project.  I don't mean to discourage anyone
> who wants to work on it - but you need to go in with eyes open, or
> you'll end up very, very frustrated.
>
> Thunderscan tried to scan line by line & retrieve grayscale; the
> challenges were piecing together the adjacent lines with pixel
> resolution.   The focal distance was constant because the camera was
> on a carriage.  The idea here is to capture a page per frame.  So the
> registration problems are quite different.  One could try the
> thunderscan approach; it would trade one set of problems xxx
> "challenges and opportunities" for another.
>
> [1] In my experience, with many brands and models of tractor feed
> printers over many years.  Paper handling is really difficult to get
> right.
> http://mailman.trailing-edge.com/mailman/listinfo/simh
John H. Reinhardt
2018-02-13 14:00:19 UTC
Permalink
On 2/11/2018 12:17 PM, Al Kossow wrote:
>
>
> On 2/11/18 10:11 AM, Dan Gahlinger wrote:
>
>> which is why I wondered what people thought of turning an old DEC teletype or printer into a scanner, by fixing a camera
>> to it
>
> sounds like a bigger version of the Thunderscan
> https://www.folklore.org/StoryView.py?story=Thunderscan.txt

I've still got one along with the ImageWriter II that I used it with. The article was pretty much correct. The Thunderscan could work really well or really not, It just depended on what you were scanning and how well the ImageWriter felt like working that day.

John H. Reinhardt
Alan Frisbie
2018-02-11 18:20:44 UTC
Permalink
On 02/11/2018 09:45 AM, Timothe Litt wrote:

> Scanning code is a bit different from scanning books. Listings tend
> to have headers, footers, (tractor feed holes), notations - in some
> cases, assembly code or other columns - separate from the code.
> Plus lines and/or colored bars.

Back in 1972, the place I worked at had a large Xerox copier with a
tractor feed for copying this sort of listing. You dialed in the
page height, and it would copy your entire fanfold listing automatically.
The glass was curved, so the tractors could be adjusted to hold the
paper firm against the glass.

After all this time I doubt that any of these still exist, unfortunately.
Also, it only did copying, not scanning.

On a slightly related note:

Years ago, at a customer site, they had an Burroughs accounting machine
which loaded programs from paper tape. The service company had a
diagnostic paper tape which I very much wanted a copy of, but they would
not let it out of their sight for me to take home and copy. My solution
was to put it on the photocopier -- 16 inches at a time -- and image
it. After marking the overlaps, I trained one of the clerks to read
the holes and write the octal value next to it, then type them all into
a file. I then took the file home and punched a new tape. Of course,
there were errors, but by laying the new tape over the images, they were
quickly found and corrected. I repaid the service technician by making
him a few spare copies.

Alan Frisbie
David Wijnants
2018-02-11 23:02:11 UTC
Permalink
*"so I guess question 1: how best to get rid of the folds? my method so
far: fold them the other direction and try and fold it out, but so far not
much luck"*

If you have a big wad of paper that has been folded in half newspaper
style, fold it the other way a few times. Then place it concave side down
on a flat surface, put a board on top of it, and weigh it down with some
books. (Book binders wrap bricks in paper for this purpose.) Leave it for a
few weeks or even months, the longer the better. You should end up with a
listing that is flat enough to work with even if it isn't perfect. If each
page is individually folded into a hard crease, then you'll just have to
deal with the unwanted line yourself (see Photoshop/GIMP tips below).

*"and 2: how best to scan 100s of wide fanfold printout pages?"*

Scanners are very slow. A camera is *much* faster.

Buy a second hand DSLR with "kit lens" from eBay or Craig's List. A ten
year old Canon 350D/Rebel is more than capable, resolution wise. Nikon and
others are just as good. Put it on a sturdy tripod, preferably with a
tiltable or repositionable centre column. (If you enquire at your nearest
camera club, someone may even be willing to help you. A good tripod can
cost a lot of money, so borrow one if you can.) A cable release suitable
for your camera model will also be a big help and costs next to nothing on
eBay.

You don't need a full frame camera (expensive), a "crop sensor" works fine.
Also, don't use a point and shoot digicam, because these tend to have tiny
sensors that need a larger subject distance if you want to avoid
distortion. A cheap DSLR is fine.

Place the listings on the floor, showing two consecutive sheets, with the
stack evenly distributed so have have approximately equal number of folds
top and bottom. The camera, of course, faces down from above. Use a couple
of desk lamps or mirrors to light the subject from a 45 degree angle. If
you end up with harsh light dark areas, try diffusing the light with some
tracing paper or chiffon. If you are in a well lit room with light coming
from all directions then you may not even have to mess with extra lights or
mirrors.

Place some masking tape (blue painters' tape) around your fanfold to help
keep things in the same place. With a bit of tripod and camera adjustment
you should be able to get a full two sheets in your viewfinder, along with
the masking tape, and enough of a margin to help with alignment. Adjust the
camera on the tripod head so the image comes out as close as possible to
being a rectangle. The blue masking tape will help you with this.

Camera settings:

- Zoom your lens to around 50mm. For example, if you have an 18-55mm kit
lens, use 55mm. That gives you the least distortion. Avoid the "wide" end
of the lens (e.g. 18mm) because that's where you tend to get more
"pincusion" or "barrel" distortion.
- Set your image quality to Large/Fine JPEG.
- Put the camera into aperture priority (Av on Canon).
- Set the aperture to one or two stops from the lowest number. For
example, if you are using an 18-55mm, the lowest number (largest aperture)
is typically something like 3.5 or 4, so the best setting will be something
like 5.6 or 8. This will get you the best sharpness for your lens. Depth
of field should be fine if your listings are flat enough and you are not
using high magnification or extreme telephoto (which you aren't).
- The camera takes care of the shutter speed.
- If the shutter speed ends up being longer than 1/60th second, then you
can increase the ISO value (camera sensitivity) to 400 or so to get faster
shutter action and less risk of camera shake.
- Use manual focus. Most lenses have a little lever on the side to
switch between AF/MF.

Then it's a matter of focusing and pressing the shutter release. Turn a
page, press the shutter, turn a page, press the shutter. Every couple of
shots, redistribute your fanfold so you have approximately equal folds top
and bottom. For example, every time you move five folds from top to bottom,
also move five folds from the bottom half back to the top. That keeps the
stack relatively level and the edges nice and parallel. While you're at it,
re-check the focus every time you redistribute your fanfold.

Keep doing this until you get back to the start of your listing. You will
very quickly get into a certain rhythm turning pages and firing the
shutter. Before you know it, you will have gotten through 100s of pages. If
you have someone around who can help check the focus and press the shutter
while you turn the pages, that obviously helps to speed things up further.

When you have photographed all your listings, go outside and start to enjoy
your new photography hobby.

To convert your colour photographs to black and white text only images, the
following Photoshop or GIMP concepts are worth learning to use:

- Levels. This is a trio of black, white and neutral sliders that can
often make your background disappear almost by magic. You'll also use this
in ordinary digital photography to bring out detail in your shadow areas.
- Channels. If your paper has green or blue bars, then you can
selectively remove those colours to make the bars disappear, hopefully with
very little damage to your text.
- If you have an ugly black line down the middle of each page (i.e. your
pages were folded sharply in half), then you can quite easily cut and paste
one half of your image to shift it a few pixels to the left or right. That
will inevitably distort a letter or two, but OCR software is pretty good at
dealing with the odd imperfection.

OCR software often does all this automatically. Apparently, code tends to
scan really well if you tell your software not to use a dictionary. (I've
only used SimpleOCR myself, and that only with English text.)


Hope this helps,
David.


On 11 February 2018 at 16:09, Zane Healy <***@avanthar.com> wrote:

> On Feb 11, 2018, at 6:55 AM, Dan Gahlinger <***@hotmail.com> wrote:
>
> I have several printouts like this,
> the one I was just trying to scan in is labelled "EMPIRE Version 4.0
> 18-Jan-81"
> with the notice: "Please send bug reports to ELROND::EMPIRE"
> This is a Vax/VMS Fortran conversion from TOPS-10/20 from sources from
> around fall 1979
> It seems I only have the first 95 pages of this printout
> and it's folded width-wise, making scanning more difficult, old folds are
> hard to get out.
>
> I also have Zork (Vax/VMS) and of course several different iterations of
> Trek7 (Vms)
> somewhere I have a copy of Adventure (Colossal Cave) and the "Castle" game
> I love so much.
>
> so I guess question 1: how best to get rid of the folds? my method so far:
> fold them the other direction and try and fold it out, but so far not much
> luck
> and 2: how best to scan 100s of wide fanfold printout pages?
>
> I wish someone in Toronto had converted an old teletype and put a camera
> on it, that would be brilliant!
>
> Dan.
>
>
> The best way might be a piece of glass (to keep the paper flat), a copy
> stand, and a high-MP DSLR. Lighting in that situation would be

> interesting. I’m not sure how much a polarizer on the lens would help.
> One option might be to put it on a light table, but I think that would
> create an interesting/unreadable mess. Actually less light might be
> better, and simply go with longer exposures.
>
> There are graphic arts scanners that will do large pages, but in the art
> reproduction world, the method above (normally minus the glass), is more
> normal. You’re lucky, you’re looking to copy something that doesn’t need
> to be 1200dpi or better. I know you can get up to at least 12x18 range
> with a scanner. I’m currently looking for either one of these, or ideally
> a drum scanner capable of handling 11x14 negatives. Right now the only way
> I have to get a digital copy of photo’s taken with my 11x14 camera, is to
> photograph the prints.
>
> Zane
>
>
>
>
> _______________________________________________
> Simh mailing list
> ***@trailing-edge.com
> http://mailman.trailing-edge.com/mailman/listinfo/simh
>
Bob Supnik
2018-02-11 18:53:42 UTC
Permalink
Zork (Dungeon) for VAX/VMS is available here: http://simh.trailing-edge.com/games/dungeon.zip

The sources to Adventure (VAX/VMS version) are also online, as is the MDL source for Zork. I have the PDP-11 version of Adventure as well.

/Bob

On 2/11/2018 1:22 PM, simh-***@trailing-edge.com wrote:
> zork vms is 6 inches thick, it's no wonder i've never tried it...
>
> Dan.
Loading...