Google Should Buy the Entire Publishing Industry

The creation of books is a cottage industry: solitary artisans or small teams labor away in private to produce eccentric products tailored to the recondite needs of an audience which may be tiny (in extreme cases of academic publishing, the reader community may number a couple of hundred, worldwide) and is in any event very small by mass media standards. Even a wildly successful global-scale bestseller is unlikely to rival the audience (much less the profitability) of a mid-ranking Hollywood movie. Publishing as an industry is only large because it is so prolific, with roughly 300,000 trade books published per year in the US (“Books published per country per year”); it may come as a surprise to realize that the global turnover of the publishing industry is around the $20 billion mark.

Moreover, much of this turnover is absorbed by the supply chain. Amazon alone has three times the turnover of the Big Five multinational publishing conglomerates (who account for 80% of the industry’s revenue). Publishing is, quite simply, not a very profitable industry sector – it’s labor-intensive, inefficient and the only reason we put up with it is (to paraphrase Winston Churchill) because all the alternatives are worse.

Part of the reason we put up with the system is because it gives authors (the “we” in this context) a mechanism for remuneration. Writing is hard brain-work. Worse, non-authors underestimate it. (Most people have the basic literacy skills to read a book, and also to construct a sentence or write a paragraph. Books are, to a first approximation, just lots of paragraphs strung together: “so why shouldn’t I write a novel?” thinks the lay reader. This ignores the fact that a novel is structurally different from a high school essay the way a wide-body airliner is structurally different from a balsa-wood toy glider: there’s a complexity angle that isn’t immediately obvious. But I digress.) So books and the labor that goes into making them are persistently undervalued.

The mechanism by which working authors currently earn a living is copyright licensing. We automatically own copyright – literally, the right to control copying – over material we have invented. If we’re successful, we license the right to make copies to a publisher, who sells copies to the general public and pays us a pro-rata share of their receipts (a royalty).

There’s an interesting paradox implicit in the copyright/royalty licensing paradigm, of course. The more expensive the product, the more money the author receives per copy – but the fewer the number of customers. Consumers are convinced that anyone can write a book: how hard can it be? So the idea of charging, say, $10,000 a copy for a novel strikes them as ludicrous, even if the work in question took the author years of hard work to produce. In economic theory, the term for the change in demand as the price of a product increases is the price elasticity of demand.

Books are problematic: it turns out that e-books in particular suffer a drastic drop in demand if the cover price exceeds a very low threshold – around $4.99 in the US market. This is considerably lower than the price of a mass market paperback, much less a hardcover: consumers, it would appear, value the information content of a book less highly than the physical object itself.

As an author I have two goals. I want to maximize my income, and I want to maximize my readership. But by seeking to maximize income per copy sold, I may inadvertently minimize the number of copies sold, i.e. minimize my readership. The two goals are not merely orthogonal; they may be in conflict.

Anyway, this brings me to an interesting thought experiment: what would be the consequences if a large internet corporation such as Google were to buy the entire publishing industry?

Bear in mind that Google or Apple have a sufficiently large cash pile that they could take out a majority stake in all of the Big Five – it would only take on the order of $10 billion. Also bear in mind that the paper publication side of these organizations could remain largely unaffected by this takeover, insofar as they could still be operated as profitable commercial business units. The focus of the takeover by Google would be on the electronic side of the industry. The purchaser would effectively have acquired the exclusive electronic rights to roughly 300,000 commercial-quality books per year in the US market space. They could provide free public access to these works in return for a royalty payment to authors based on a formula extrapolating from the known paper sales, or a flat fee per download; or they could even put the authors on payroll. The cost would be on the order of a few billion dollars per year – but the benefit would be a gigantic pool of high-quality content.

From an author’s point of view, the benefits should be obvious. Having your books given away free by FaceAppleGoogBook maximizes your potential readership, while retaining print royalties and some sort of licensing stipend from FaceAppleGoogBook should maintain your income stream. Win on both counts!

Such a buyout would amount to a wholesale shift to a promotion-supported model for book publishing. Google would presumably use free book downloads to drive targeted advertising and collect information about their users’ reading habits and interests. Apple might use the enormous free content pool as a lure for a shiny new proprietary iReader hardware device. Facebook could target the authors, wheedling them to pay for promotional placement in front of new readers. The real questions are: is there enough money in a new shiny iReader device or the AdWords market (indeed, the advertising industry as a whole) to support the publishing sector as a promotional loss-leader; and, would this get FaceAppleGoogBook something they don’t already have?

Perhaps we should ask why they haven’t done this already.

The dismal answer probably lies in the mare’s tale of contracts and licensing agreements and legal boilerplate that underpins the publishing industry. The 300,000 books/year figure points to 300,000 legal contracts per year. Contracts which in many cases ban advertising, or place bizarre constraints on licensing and sub-licensing and distribution through anomalous channels such as Edison wax cylinder reproduction rights and talking stuffed character toys. Untangling the e-publishing rights and renegotiating the right to distribute them for free in return for a flat payment would be a nightmare; only an algorithmic approach to massively parallel contract negotiation could succeed and such an exercise might strain even Google’s prodigious programming capabilities. And as an afterthought, why should FaceGoogleBook try to buy books so that they can advertise through them, when they can plaster advertisements all over the search pages that lead readers to the books, or the commerce sites that sell them?

Looks like my utopian future as a salaried Google employee churning out Creative Commons licensed, freely downloadable novels for my enthusiastic audience (enthusiastic because everything is suddenly free – in return for their eyeballs, of course) will have to wait.

Do Zimboes Dream of Electric Sheep?

The act of reading is inextricably linked to the intertwined structures of language and consciousness.

We are conscious beings; as mammals, when we experience the world around us we weave a narrative account of our existence that gives us a retrospective timeline in which to anchor our viewpoint and sense of unitary identity. We possess a “theory of mind” which allows us to ascribe intentionality to other organisms – the dog bit the postman because it was frightened (and fear provokes a fight/flight response) – a valuable survival ability during our prehistory on the plains of Africa. And we possess language with syntax and deep semantics and grammar, a possibly unique and very powerful capability that allows us to encode behavior and insights and transfer them from one mind to another.

Cognitive philosophers have, over the years, chewed on the concept of consciousness until it is grey and mushy about the edges – but with little digestive success. One thought experiment they use to examine this phenomenon is the idea of the zombie. In cognitive science, a zombie is a philosophical thought experiment: a human being with no interior state, no sense of identity, no “I.” Philosophical zombies do not, as far as we know, exist, but they possess a number of interesting attributes; they presumably eat, sleep, breathe and respond to stimuli, but possess no personhood. If you ask one who he or she is, or what they are experiencing, they won’t be able to frame a reply that encodes any sense of identity: they observe but they do not experience.

To probe some questions arising from philosophical zombies, Daniel Dennett proposed a new category: the “zimboe.” A zimboe is a special type of zombie which, when asked, will deny that it is a zombie. That’s its sole specialty. It’s like an empty house where the lights are on and nobody’s home, but the absent householder has left a tape-recording of a dog barking or a baby crying playing on a continuous loop to convince burglars that it’s a bad prospect. If you ask a zombie about themselves they can’t tell you anything. If you ask a zimboe about themselves they will spin a convincing yarn, but it’s a lie – they don’t feel anything. Detecting a zimboe is next to impossible because they claim to be conscious; we might be surrounded by them, or even married to one, but we might never know.

When we read fiction or autobiography or any other narrative text that encodes a human experience as opposed to some assertion about the non-human universe, we are participating in an interesting process that Stephen King described as the nearest thing to telepathy that humanity has yet developed. An author has encoded their interior experience, serialized it as text and handed it to the reader in some kind of package. The reader then inputs the text and, using their theory of mind, generates a simulation of the interior mental states the writer was encoding.

What happens when a zimboe reads Pride and Prejudice and Zombies?

The lights are on, but there’s no consciousness present and therefore no theory of mind to be deployed to generate an emulation of the interior states of Jane Austen’s characters. You can quiz the zimboe about their reading matter and they can answer factual questions about the text, but they can’t tell you why Elizabeth and Mr. Darcy are feeling any given emotion, because they lack the theory of mind – the cognitive toolkit – necessary to infer interior states and ascribe them to other entities.

We may therefore expect zimboe lairs to be curiously deficient in the kind of reading matter that provokes emotional engagement and long interior arguments with recalcitrant fictional protagonists who need to recognize the error of their ways, pull their heads out of their fictional asses and sort themselves out.

And, more fundamentally, we may infer the existence of a cast-iron test for whether a person is a person or a zimboe…because zimboes can’t write fan fic. Not even bad fan fic. They probably can’t write any kind of fiction at all, or even reliably recognize the distinction between fiction and narrative fact.

Zimboes don’t dream of electric sheep. And, come the zombie apocalypse, we can use this fact to defend ourselves from them!

Why Microsoft Word Must Die

I hate Microsoft Word. I want Microsoft Word to die. I hate Microsoft Word with a burning, fiery passion. I hate Microsoft Word the way Winston Smith hated Big Brother. Our reasons are, alarmingly, not dissimilar….

Microsoft Word is a tyrant of the imagination, a petty, unimaginative, inconsistent dictator that is ill-suited to any creative writer’s use. Worse: it is an aspiring monopolist, having nearly 80 percent of the word processing field to itself. Such dominance has brutalized the minds of software developers to such an extent that few can imagine a word processing tool other than as a shallow imitation of the Redmond Behemoth. So what’s wrong with it?

I’ve been using word processors and text editors for nearly 30 years. There was an era before Microsoft Word’s dominance when a variety of radically different paradigms for text preparation and formatting competed in an open marketplace of ideas. One early and particularly effective combination was the idea of a text file containing embedded commands or macros that could be edited with a programmer’s text editor (such as ed or TECO or, later, vi or Emacs) and subsequently fed to a variety of tools: offline spelling checkers, grammar checkers and formatters like Scribe, Troff and LaTeX that produced a binary page image that could be downloaded to a printer.

These tools were fast, powerful, elegant and extremely demanding of the user. As the first 8-bit personal computers appeared (largely consisting of the Apple II and the rival CP/M ecosystem), programmers tried to develop a hybrid tool called a word processor: a screen-oriented editor that hid the complex and hostile printer control commands from the author, replacing them with visible highlight characters on screen and revealing them only when the user told the program to “reveal codes.” Programs like WordStar led the way, until WordPerfect took the market in the early 1980s by adding the ability to edit two or more files at the same time in a split screen view.

Then, in the late 1970s and early 1980s, research groups at MIT and Xerox’s Palo Alto Research Center began to develop the tools that fleshed out the graphical user interface of workstations like the Xerox Star and, later, the Apple Lisa and Macintosh (and finally the Johnny-come-lately imitator, Microsoft Windows). An ongoing war broke out between two factions. One faction wanted to take the classic embedded-codes model and update it to a graphical bitmapped display: you would select a section of text and mark it as “italic” or “bold” and the word processor would embed the control codes in the file and, when the time came to print the file, it would change the font glyphs being sent to the printer at that point in the sequence. But another group wanted to use a far more powerful model: hierarchical style sheets. In a style sheet system, units of text – words or paragraphs – are tagged with a style name, which possesses a set of attributes which are applied to the text chunk when it’s printed.

Microsoft was a personal computer software company in the early 1980s, mostly notable for their BASIC interpreter and MS-DOS operating system. Steve Jobs approached Bill Gates to write applications for the new Macintosh system in 1984, and Bill agreed. One of his first jobs was to organize the first true WYSIWYG word processor for a personal computer – Microsoft Word for Macintosh. Arguments raged internally: should it use control codes or hierarchical style sheets? In the end, the decree went out: Word should implement both formatting paradigms. Even though they’re fundamentally incompatible and you can get into a horrible mess by applying simple character formatting to a style-driven document, or vice versa. Word was in fact broken by design from the outset – and it only got worse from there.

Over the late 1980s and early 1990s Microsoft grew into a behemoth with a near-monopoly position in the world of software. One of its tactics became known (and feared) throughout the industry: embrace and extend. If confronted with a successful new type of software, Microsoft would purchase one of the leading companies in the sector and then throw resources at integrating their product into Microsoft’s own ecosystem, if necessary dumping it at below cost in order to drive rivals out of business. Microsoft Word grew by acquiring new subsystems: mail merge, spelling checkers, grammar checkers, outline processing. All of these were once successful cottage industries with a thriving community of rival product vendors striving to produce better products that would capture one another’s market share. But one by one, Microsoft moved into each sector and built one of the competitors into Word, thereby killing the competition and stifling innovation. Microsoft killed the outline processor on Windows, stalled development of the grammar checking tool, stifled spelling checkers. There is an entire graveyard of once-hopeful new software ecosystems, and its name is Microsoft Word.

This planned obsolescence is of no significance to most businesses, for the average life of a business document is less than 6 months. But some fields demand document retention. Law, medicine and literature are all areas where the life expectancy of a file may be measured in decades, if not centuries. Microsoft’s business practices are inimical to the interests of these users.

Nor is Microsoft Word easy to use. Its interface is convoluted, baroque, making the easy difficult and the difficult nearly impossible to achieve. It guarantees job security for the guru, not transparency. For the zen adept who wishes to focus on the task in hand, not the tool with which the task is to be accomplished, it’s a royal pain in the arse and a perpetual distraction. It imposes its own concept of how a document should be structured upon the writer, a structure best suited to business letters and reports (the tasks for which it is used by the majority of its users). Its proofing tools and change tracking mechanisms are baroque, buggy and inadequate for true collaborative document preparation; its outlining and tagging facilities are piteously primitive compared to those required by a novelist or thesis author; it’s macro language (a descendant of BASIC) is an insult to the intelligence of the programmer, and the procrustean dictates of its grammar checker would merely be funny if the ploddingly sophomoric business writing style it mandates were not so widespread.

But this isn’t why I want Microsoft Office to die.

The reason I want Word to die is that until it does, it is unavoidable. I do not write novels using Microsoft Word. I use a variety of other tools, from Scrivener (a program designed for managing the structure and editing of large compound documents, which works in a manner analogous to a programmer’s integrated development environment if Word were a basic text editor) to classic text editors such as Vim. But somehow, the major publishers have been browbeaten into believing that Word is the sine qua non of document production systems. They have warped and corrupted their production workflow into using Microsoft Word DOC files as their raw substrate, even though this is a file format ill-suited for editorial or typesetting chores. And they expect me to integrate myself into a Word-centric workflow, even though it’s an inappropriate, damaging and laborious tool for the job. It is, quite simply, unavoidable. And worse, by its very prominence, we become blind to the possibility that our tools for document creation could be improved. It has held us back for nearly 25 years already; I hope we will find something better to take its place soon.

Publishers: What Are They Good For?

I think it’s important, when discussing the future of the book and the future of publishing, to start with an understanding of what publishers do today.

The job of the publisher is to take a manuscript (a written text or collection of text and illustrations) supplied by an author, turn it into a book and distribute the book to readers.

The publisher and the author may be the same person or organization, or they may be a publishing house – a company or organization that publishes other people’s work. The publisher may be for-profit or non-profit. It may range from the author distributing their own work for free all the way to a multi-billion dollar turnover multinational with divisions that handle other kinds of media. But whatever the business model of the publisher, the job is what I outlined in the previous paragraph.

This sounds simple enough, but there are a lot of intermediate steps in publishing. Manuscripts aren’t usually publishable as delivered. In the old days they may well have been handwritten; these days they’re usually prepared on a computer, but they may contain typos, spelling mistakes, internal contradictions, libelous statements (which might get the publisher and/or author sued if they are published without alteration or fact-checking) and other flaws.

The general process of publishing a book resembles the old-school waterfall model of software development, with feedback loops between author and publishing specialist at each stage. The stages are, broadly speaking:

Substantive editing: An editor or reviewer reads the manuscript, and calls the author’s attention to errors, problems or high-level structural flaws in the book. The author then fixes these.
Copy editing: A copy editor checks the manuscript for grammatical and typographical consistency, correcting spelling mistakes and punctuation errors, preparing lists of names, titles and other uncommon terms for reference, and imposing the publishing house style on the book if appropriate. The author then reviews the copy edited manuscript and approves or rejects the CE’s changes.
Book design: Cover art is commissioned. A cover layout/design is prepared, using the cover art. Flap copy/advertising material is prepared. Review quotes are commissioned. The book package is then ready for typesetting.
Typesetting: A typesetter imports the copy-edited manuscript into a layout program – typically a DTP package such as Quark Publishing System or Adobe InDesign, but it may be a formatting command language such as LaTeX – then corrects obvious layout options: ladders, runs, orphans and widows, hyphenation. The typesetter also prepares front matter and back matter such as a table of contents.
Indexing: Optional – an indexer prepares a list of keywords and generates an index from the typeset file; this generally goes into the back matter. The author may provide feedback on the keywords to use, or even provide the initial list.
Proofreading: A proofreader checks the page proofs – typically PDF files these days – for errors introduced at the typesetting stage. The author may also check the page proofs. Corrections are collated and fed back to the typesetter.
Bluelining: Final page proofs are prepared and re-checked for errors. The author is not usually involved at this stage, which may be described as second-stage proofreading.
Registration and marketing: The publisher registers an ISBN for the book and a Library of Congress (or other national library of record) database entry. A copy will be lodged with the relevant libraries. Additionally, Advance Reader Copies may be laser-printed, manually bound and mailed to reviewers (or electronic copies may be distributed). Advertisements may be placed in the trade press. Other marketing promotional activities may be planned at this stage (if there’s a marketing budget for the title and advance orders from booksellers indicate that promotional activities will generate sufficient extra sales to justify the expense).
Manufacturing: The publisher arranges to have the book blocks printed, bound into covers, and guillotined and trimmed. A dust jacket may also be printed and wrapped around the hardcover book. Alternatively, paper covers may be printed and the book block perfect-bound (glued into the cover using thermoplastic glue). Alternatively, a master e-book is generated from the typeset file and, optionally, uploaded to the DRM server (or distributed as-is without DRM).
Distribution: Copies of the physical book are shipped to warehouses or retailers. The e-book is released to the various commercial e-book store databases.

This waterfall process generally operates on a 12 month time scale. That’s not because it has to take 12 months – in extremis a trade publisher can rush a topical current affairs title through in as little as 8 weeks from start to finish, including writing time (by editing and typesetting chapters as they are handed in by a team of authors) – but because publishers operate a production pipeline – essentially a conveyor belt that takes in a number of manuscripts and emits the same number of finished books on a monthly basis. Everything runs in lockstep at the speed of the slowest supplier, because to do otherwise risks the production line stalling due to lack of inputs.

As much of the process as possible is outsourced. Publishers do not own printing presses. Copy editors are freelance workers, paid a piece rate per book copy-edited. Typesetting is carried out by specialist agencies. Artwork and design may be outsourced. In some cases, sales are outsourced. The only core activities that are always kept in-house are editorial, marketing and accounting, and editorial is as much about workflow management and marketing is as much about product acquisition as they are about their official job titles.

A major commercial publisher’s genre imprint may be emitting a handful of books a month – but the volume may be considerably higher. Tor, the largest science fiction and fantasy publisher in the United States, publishes approximately 300 books per year. Ace, Daw, Del Rey, Orbit – other genre imprints – emit 50-150 titles per year. In-house staffing levels are low; Tor employs 50-60 people full-time, so the ratio of books published to workers is roughly one book per employee per 2 months (plus perhaps another two months’ work by external contractors).

The upshot is that major publishers today operate extremely streamlined production workflows, with a ratio of perhaps five authors (content creators) per production worker (or a 3:1 ratio if we include external contractors).

A handful of final notes bear repeating:

The cost of manufacturing a book is surprisingly low – around 50 US cents for a paperback, rising to $2-3 for a hardback.
The cost of manufacturing an e-book is surprisingly high – if a publisher requires DRM, the DRM provider may charge up to 10% of the suggested retail price of the e-book for the (dis)service.
Of the retail price of a book, the publisher receives roughly 30-50%. The lion’s share of the revenue – 40-70% of the gross price – goes to the retail supply chain.
In general, trade publishers aim to make a profit on each book published equal to the physical manufacturing costs plus the (fixed) production costs (i.e. the costs of editing, typesetting, marketing and so on).
If the author’s agent has done their job properly, the author’s profit (a royalty paid per copy sold) will be approximately the same as the publisher’s profit. (The publisher makes themselves useful to the author by organizing the production workflow, marketing and distributing the product, accounting for sales and giving the author an advance against royalties – a non-returnable loan secured against anticipated future sales – which they can notionally live on during the writing and production phase of the project.)

This is what publishers do. Topics I haven’t covered include: the contractual basis for licensing publication rights to a book, the sales channels and pricing structure through which trade books are sold, how this spatchcock mess of an industry evolved and what the prospects are for its future development.

Feral Spambooks

Date October 10, 2013
Author Charlie Stross
Tags Future of Publishing, PGE, Published

In the future, readers will not go in search of books to read. Feral books will stalk readers, sneak into their e-book libraries and leap out to ambush them. Readers will have to beat books off with a baseball bat; hold them at bay with a flaming torch; refuse to interact; and in extreme cases, feign dyslexia, blindness or locked-in syndrome to avoid being subjected to literature.

You think I’m exaggerating for effect, don’t you?

Today, roughly 40-50,000 books are published commercially each year in the English language. But the number is rapidly rising, as traditional barriers to entry are fading away. Meanwhile, the audience for these works remains stubbornly static. The limits to reading are imposed by its time-rivalrous nature, in conjunction with the size of the English-reading population and the number of hours in the day. Tools that make writing and publishing easier work to increase the volume of work because the creation of books is to some extent an exercise of ego: we are all convinced that we have something of value to communicate, after all. It therefore seems inevitable that in future, there will be more books – and with them, more authors who are convinced that the existence of their literary baby entitles them to prosper from the largesse of their readers.

A burgeoning supply of books and a finite number of reader-hours is a predictor of disaster, insofar as the average number of readers per book will dwindle. The competition for eyeballs will intensify by and by. Many writers will stick to the orthodox tools of their profession, to attractive covers and cozening cover copy. Some will engage in advertising, and others in search engine optimization strategies to improve their sales ranking. But some will take a road less well-trodden.

Historically, publishers attempted to use cheap paperback novels as advertising sales vehicles. Books incorporated ads, as magazines and websites do today: they even experienced outbreaks of product placement, car chases interrupted so that the protagonists could settle down for half an hour to enjoy a warming dish of canned tomato soup. Authors and their agents put an end to this practice, for the most part, with a series of fierce lawsuits waged between the 1920s and 1940s that added boilerplate to standard publisher contracts forbidding such practices: for authors viewed their work as art, not raw material to deliver eyeballs to advertisements.

But we have been gulled into accepting advertising-funded television, and by extension an advertising-funded web. And as the traditional verities of publishing erode beneath the fire-hose force of the book as fungible data, it is only a matter of time before advertising creeps into books, and then books become a vehicle for advertising. And by advertising, I mean spam.

The first onset of bookspam went unnoticed, for it did not occur within the pages of the books themselves. Spam squirted its pink and fleshy presence into the discussion forums of Goodreads and the other community collaborative book reading and reviewing websites almost from the first. And we shrugged and took it for granted because, well, it’s spam. It’s pervasive, annoying and it slithers in wherever there’s space for feedback or a discussion.

But that isn’t where it’s going to end. An EPUB e-book file is essentially an HTML5 file, encapsulated with descriptive metadata and an optional DRM layer. The latest draft standard includes support for all aspects of HTML5 including JavaScript. Code implodes into text, and it is only a matter of time before we see books that incorporate software for collaborative reading. Not only will your e-book save your bookmarks and annotations; it’ll let you share bookmarks and annotations with other readers. It’s only logical, no? And the next step is to let readers start discussions with one another, with some sort of tagging mechanism to link the discussions to books, or chapters, or individual scenes, or a named character or footnote.

Once there is code there will be parasites, viral, battening on the code. It’s how life works: around 75% of known species are parasitic organisms. A large chunk of the human genome consists of endogenous retroviruses, viruses that have learned to propagate themselves by splicing themselves into our chromosomes and lazily allowing the host cells to replicate themselves whenever they divide. Spammers will discover book-to-book discussion threads just as flies flock to shit.

But then it gets worse. Much worse.

Authors, expecting a better reaction from the reading public than is perhaps justifiable in this age of plenty for all (and nothing for many) will eventually succumb to the urge to add malware to their e-books in return for payment. The malware will target the readers’ e-book libraries. The act of reading an infected text will spread the payload, which will use its access to spread advertising extracts and favorable reviews throughout the reader communities. You may find your good reputation taken in vain by a second-rate pulp novel that posts stilted hagiographies of its author’s other books on the discussion sites of every book you have ever commented on (and a few you haven’t). Worse, the infested novels will invite free samples of all their friends to the party, downloading the complete works of their author just in case you feel like reading them. Works which will be replete with product placement and flashing animated banner ads, just in case you didn’t get the message.

Finally, in extremis, feral spambooks will deploy probabilistic text generators seeded with the contents of your own e-book library to write a thousand vacuous and superficially attractive nuisance texts that at a distance resemble your preferred reading. They’ll slide them into your e-book library disguised as free samples, with titles and author names that are random permutations of legitimate works, then sell advertising slots in these false texts to offshore spam marketplaces. And misanthropic failed authors in search of their due reward will buy the ad marquees from these exchanges, then use them to sell you books that explain how to become a bestselling author in only 72 hours.

Books are going to be like cockroaches, hiding and breeding in dark corners and keeping you awake at night with their chittering. There’s no need for you to go in search of them: rather, the problem will be how to keep them from overwhelming you.

Reading Machines

Date October 9, 2013
Author Charlie Stross
Tags Future of Publishing, PGE, Published

One of the key attributes of reading is that – with very few exceptions – nobody else can do it for you. You have to plough through the whole thing yourself, or bounce from chapter to endnote, as is your wont: but nobody else can absorb the information on your behalf. (If a text can be reduced to a pre-digested summary, it was too long to begin with: or the digest is an incomplete representation.)

Reading is a rivalrous activity. You can listen to music or watch TV while doing something else, but you can’t (or shouldn’t) read a book while driving or mixing cocktails. Listening to audiobooks is only a partial work-around; studies suggest that knowledge retention is lower. Furthermore, they’re slower. A normal tempo for spoken English language speech is around 150-200 words per minute. A reasonably fast reader, however, can read 300-350 words per minute; a speed reader may absorb 500-1000 words per minute (although issues of comprehension come into play at that rate).

So, what kind of environment facilitates reading?

About fifteen years ago, I stumbled across my perfect reading machine – and didn’t buy it. It was on display in the window of an antique shop in Edinburgh, Scotland: a one of a kind piece of furniture, somewhat threadbare and time-worn, and obviously commissioned for a Victorian gentleman who spent much of his time reading.

In form, it was an armchair – but not a conventional one. Every available outer surface, including the armrests, consisted of bookshelves. The backrest (shielded from behind by a built-in bookcase) was adjustable, using a mechanism familiar to victims of badly-designed beach recliners everywhere. Behind the hinged front of the chair was a compartment from which an angled ottoman or footstool could be removed; this was a box, suitable for the storage of yet more books. A lap-tray on a hinge, supporting a bookrest, swung across the chair’s occupant from the left; it also supported brackets for oil lamps, and a large magnifying glass on an arm. The right arm of the chair was hinged and latched at the front, allowing the reader to enter and exit from the reading machine without disturbing the fearsome array of lamps, lenses and pages. The woodwork was polished, dark oak: the cushion covers were woven, and somewhat threadbare (attacked either by moths or the former owner’s neglected feline).

While the ergonomics of the design were frankly preindustrial, the soft furnishings threadbare, and the price outrageous, I recognized instinctively that this chair had been designed very carefully to support a single function. It wasn’t a dining chair, or a chair in which one might sip a wee dram of post-prandial whisky or watch TV. It was a machine for reading in: baroque in design, but as starkly functional as an airport or a motorway.

I knew on the spot and of an instant that I had to own this reading machine. For that is what this thing was: an artifact designed for the sole purpose of excluding distractions and facilitating the focused absorption of information from books. Unfortunately, in those days I was younger and poorer than I am today – and the antique store owner, clearly aware of its unique appeal, had priced it accordingly. I went away, slept uneasily, returned the next afternoon to steel myself for expending a large chunk of my personal savings on an item that was not strictly essential to my life…and it had already gone.

These days, I do most of my reading on a small and not particularly prepossessing sofa in one corner of my office. I’m waiting for the cats to shred it sufficiently to give me an excuse for replacing it with a better reading machine. When the time comes I will go hunting for something more comfortable: an Eames lounge chair and ottoman. Combined with an e-ink reader (with an edge-lit display for twilight reading), it approximates the function (if not the form, or the bizarre charm) of the eccentric Victorian reading machine that still haunts my dreams to this day.

Image courtesy of Wikimedia Commons

Why I’m Here – Charlie Stross

I’m Charlie Stross. I write for a living, but I’ve got a dirty little secret; I don’t understand books.

Books: a tool for conveying information — normally (but not exclusively) textual and pictorial information – from one person’s head to another’s. They’re not the only such tool, and they evolved iteratively from earlier forms. Clay or wax tablets, and bundles of leaves or tree bark, gave way to parchment scrolls and then, via Johannes Gutenberg, to bundles of “signatures” – big sheets of paper printed with text and pictures, folded and stitched and then cut along three edges – bound between leather or cloth or board covers. We’ve been refining the design and manufacture of these physical objects for hundreds of years.

Most recently, with the development of high capacity data storage media and low power/high resolution display panels, we’ve come up with machines that let us read and display text and graphics without needing the bulky, heavy lumps of bound paper. A 500 page hardback novel weighs roughly 650 grams; it contains up to 1MB of textual data. This was a remarkably compact form of information storage back in the day, but in the past couple of decades it has come to seem laughably restrictive. My iPad weighs the same as that hardback, but has roughly 64,000 times its data storage capacity – potentially enough to store an entire library. Moreover, digital data is searchable and (in principle) mechanically indexable. (Don’t mention this to a professional indexer, though, unless you enjoy being mocked; indexing is a highly skilled speciality, and one that is in danger of being destroyed by the reductionist assumptions of the software developers who build “just good enough” indexing tools into word processors.) Digression aside, what does it mean for the function of a book, the transfer of information from an author’s mind into a reader’s, when the book becomes an easily transferable chunk of data not bound to a physical medium?

We talk of publishing books, but there are many kinds of business that call themselves “publishing”. The trade fiction industry is structured and operates along radically different lines from peer-reviewed scientific journals, academic textbooks, dictionaries, map-makers, and graphic novels. All of these industries have the core function in common — transferring textual or graphical ideas between minds – and all of them traditionally ran on ink on paper printing, but the source material, editorial processes, marketing and distribution channels are so radically different as to be nearly unrecognizable. An innovation in production that disrupts and revolutionizes one publishing industry sector may be irrelevant, inapplicable, or laughable to another. They may even surface in an unrecognizable form: the academic paper public pre-print service provided by Arxiv.org bears an odd resemblance to some of the urban fantasy/media fanfic aggregator websites if you squint at it in the right light – the workflow of submitting an astrophysics paper to Arxiv.org is eerily similar to that for submitting a Harry Potter fanfic to fanfiction.net.

We think of authors, especially authors of fiction, as being creative monoliths who have total control over the cultural artifact they produce – the mechanism for transferring ideas from Head A into Head B – but that’s not actually the case. Some authors write using an amanuensis or secretary. Some authors collaborate. Their manuscripts are then edited – both substantially, by an editor who reviews the structure and content and suggests changes or even re-writes sections, and at the copy level, by a copy editor who enforces syntactic and grammatical consistency and corrects spelling errors. The author may not be responsible for the final title of their work; they are almost certainly not responsible for the cover or other marketing adjuncts. Authors work as part of a complex ecosystem, which exists to generate inputs compatible with the production pipeline that results in physical books.

Again, we need to ask: how does the shift to books-as-data affect the processes by which books are created? Are some specialities or workflows no longer needed? Are other, new techniques required? The transition from hot lead typesetting in the 1980s rendered human typesetters’ skills obsolete but opened up new roles in layout and design for the more forward-looking professionals in that sector (which, while heavily automated by Desktop Publishing [DTP] applications, nevertheless raised standards of book production quality across the board after the initial excesses of the “I’ve got a font so I’m going to use it!” school subsided). What is the equivalent of the hot metal typesetter to DTP transition, and what new skills and specialities is it going to generate?

I’ve been writing on this subject for most of an hour, and I’ve barely begun to scratch the surface. Two decades ago, in 1993, I thought I pretty much understood what a book was; now, in 2013, I’m far less certain, because the book has acquired a strange, shimmering, protean nature. Books are changing. And I’m here to take a look at how and why, and what they might look like a couple of decades hence.

Sprint Beyond the Book

Author: Charlie Stross