From last time, we were going to look at the MuPDF API for the last remaining task. However, we should step back and take a look at where we are going before moving forward and why we are doing each thing.
Brainstorming
First, we will look at UI functionality then move on to background processing that is necessary to support each of those.
UI
- Display
- Navigation
- back, forward
- jump to page
- history list (what was the last page I was at)
- Reading order: may be embedded in the PDF or might require extraction
Metadata
- RDF database
- Printing: cover page with metadata and QR code
- Summary: automatic summarisation
- Example old paper from 1965: A Semi-Automatic Computer-Microscope for the Analysis of Neuronal Morphology
- Note how the page layout was done manually
- BibTeX entry types table
Binding example:
- Using Inline::C to bind to
libmatio
:
- Using Inline::C to bind to
Implementation work
We continued work on the simple GUI by adding buttons to the Glade GUI for jumping to the first and last pages and moving forward and backwards one page at a time. See PR .
To speed up development, we skipped creating a binding for now and
just used the
mudraw
command to get the PDF page as a PNG image
and pdfinfo
command to get the number of pages in the PDF. The following is the
result.
Further reading
Books
These two books discuss and compare how people use books both on paper and on screens. There are many ideas in them both.
Marshall, Catherine C. "Reading and writing the electronic book." Synthesis lectures on information concepts, retrieval, and services 1.1 (2009): 1-185.
Dillon, Andrew. Designing usable electronic text: Ergonomic aspects of human information usage. CRC Press, 2004.
Review articles
- Koolen, Corina, Ray Siemens, and Alex Garnett. "Electronic Environments for Reading: An Annotated Bibliography of Pertinent Hardware and Software (2011)." Scholarly and Research Communication 3.4 (2012).
Research articles
Hinckley, K., Bi, X., Pahud, M., Buxton, B. "Informal information gathering techniques for active reading." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2012.
See this video for a demo.
Willinsky, John, Alex Garnett, and Angela Pan Wong. "Refurbishing the Camelot of Scholarship: How to Improve the Digital Contribution of the PDF Research Article." Journal of Electronic Publishing 15.1 (2012).
This one comes with a demo PDF that demonstrates various aspects of what they would like to achieve which they compare with the article that they modified.
As you can see, the authors opt to change the way PDFs are published, but clearly, we can't convince everyone to do that. So instead, we will need to be able to reformat documents our own way.
That will be the main contribution of Project Renard that nobody has done yet. I want to call it document resynthesis. I got the "resynthesis" name from a sound synthesis technique that takes existing sounds and takes the Fourier components and processes them in order to make new sounds.