24 October 2009

The Books of Brin

In an earlier posting Google Books -A Library to Last Forever I suggested that Google so-founder Sergey Brin had accumulated well over 15,000 real books by the time he was 26. He still has a site up from his Stanford days with them all listed - My Favorite Books - a strange but just about credible collection with a heavy emphasis on SF and fantasy fiction although almost all the world's classics are there from Aeschylus to Xenophon. There's Crowley and Huxley, Le Fanu and Lovecraft, Deepak Chopra, most of the Booker authors and an unusual amount of female writers (some romantic or sword and sorcery) for a mere male to possess. I had a vague suspicion that these were not books read by him or even owned by him. How could the poisonous 'Protocols of the Elders of Zion' be a favourite book, let alone co-exist with Chris Rock's 'The Bitch Factor' ? Through some data mining aided by Google I ascertained that indeed these were not Sergey's books, but part of an early web search project done while he was at Stanford. Below is the garage in nearby Menlo Park where Google was born and where I had imagined he kept these books.

Sergey Brin did not possess 15000 books. In 1996/ 1997 he and 3 other Stanford guys were working on something called 'Dual Iterative Pattern Relation Expansion (DIPRE)'. As they put it
We begin with a small seed set of (author, title) pairs (in tests we used a set of just five pairs). Then we find all occurrences of those books on the Web (an occurrence of a book is the appearance of the title and author in close proximity on the same Web page). From these occurrences we recognize patterns for the citations of books. Then we search the Web for these patterns and find new books. We can then take these books, find all their occurrences, and from those generate more patterns. We can use these new patterns to find more books, and so forth. Eventually, we will obtain a large list of books and patterns for finding them.
They chose 5 books - Isaac Asimov's Robots of Dawn, David Brin's Startide Rising, James Gleick's Chaos: A New Science, Shakespeare's 'Comedy of Errors' and Dickens's 'Great Expectations'. Each book seems significant in hindsight--and it is likely these data miners in the dawn had great expectations. Anyway it was only the two science fiction books which produced usable patterns (3) and after searching 5 million web pages for these two they found 105 patterns…eventually adding the word 'books' they produced 15,527 titles "with very little bogus data." These are the books listed on the web as 'Sergey Brin's favourite books.' Books were useful for establishing the search code as the author and the title are often close together. To a civilian this stuff is mostly impenetrable but it seems what they were doing was laying the foundations for the code used by Google, truly a licence to print money (so much that he is now contemplating launching a space ship...)

Reading this list without knowing the above it had seemed a strange and wondrous bunch of books. He even had a title that someone asked for this morning - D.E. Harding's mystical classic 'On Having No Head.' It is not impossible for a young person to accumulate 15000 books--if he or she buys 30 books a week from age 15 to 25 and has somewhere to put them they can achieve it with ease. Regular attendance at library sales will help. I have seen such collections, the novelist Hanif Kureishi who used to live near our shop in the early 1980s had about 10,000 paperbacks and he was not yet thirty. I knew a teenage dealer with 20000 books in a storage unit in the unpromising London suburb of Neasden. So it was entirely credible Brin, a highly educated student, could have this quantity of books.

I first became suspicious when I came across books on the list by the obscure 90s writer Dollie Radford. I knew her books because recently we bought some of her son's library from a relation-- he had been a minor poet and a fringe Bloomsbury player (that's him below picnicking with handsome Rupert Brooke and RB's inamorata Noel Olivier and Virginia Woolf in a fetching headscarf.) What was Sergey doing reading Dollie? When I googled the pair of them all was explained. He actually mentions Dollie in one of his papers 'Extracting Patterns and Relations from the World Wide Web' -noting that 'one of the most surprising results was finding books which were not listed in major online sources such as 'The Young Gardener's Kalendar by Dollie Radford [Rad04]…'


Bloom's After Noon said...

You mention Hanif Kureishi who I used to see around Hammersmith. Didn't they (Frears etc.,) shoot a scene of his movie "Sammy and Rosie Get Laid" in your shop?

Bookride said...

You're not wrong! Great publicity it was too. Hanif was a keen buyer and a cool guy. I guess Frears must have come along to direct but I don't remember --we'r talking 22 years ago. Thanks. N.