In Memory of John Westbrook

It is with great sadness that I write these words to honor John Westbrook at the time of his passing. I am compelled to do so to highlight the contributions he made to structural biology and structural bioinformatics. Much of his work involved the nuts and bolts that made all the science happen. That coupled with his humble demeanor and inherent shyness meant he was more often than not a middle author on papers that have had and are having a profound impact on the fields he touched. In short, like many scientists, he was an unsung hero. 

I worked closely with John for 17 years and know the hero he really was. It is never too late to sing those praises, alas, even if it is only in his memory.

I first met John around 1990 when he, working with Helen Berman at Rutgers University, had created the Nucleic Acid Database (NDB). At the time I was at Columbia University experimenting with object-oriented programming and object-oriented databases in an effort to ask questions of the fast growing body of structural data. John and Helen’s approach was relational and pragmatic; mine, in collaboration with the computer scientist Calton Pu, was slow and risky. It was that combination that was to characterise our relationship throughout our work together on the RCSB PDB, which John continued until his death. 

We, like many others, notably Janet Thornton, Shoshana Wodak, and Steve Bryant recognized that regardless of the framework used to represent the PDB data, the data itself was inconsistent and error prone. Around this time the International Union of Crystallography under the auspices of Syd Hall, Brain McMahon and others had created the Crystallographic Information File (CIF) for the representation of small molecule crystallographic data as a consistent archival format. Under the leadership of Paula Fitzgerald, Helen Berman, Keith Watenpaugh, Brian McMahon, John and I set about creating the Macromolecular Crystallographic Information File (mmCIF) building off the CIF work. What came from this was the ontology that underlies much of structural biology today. We would meet on weekends on the Rutgers campus carving out definitions and their relationships. John would listen and then in the work weeks that intervened turn the conceptual framework penned very much by Paula into machine readable definitions and working code. In that sense working mmCIF was very much his invention. A landmark meeting in York, UK in April of 1993 cemented the idea that mmCIF could be more than an archival format for macromolecules but a conceptual schema from which different data representations could be derived. It comprised a Dictionary Description Language (DDL), a dictionary conformant with that language and data files conforming to the dictionary, all self-defining and laid out mostly by John. So much more could be said about John’s early contributions. The initial paper [1] is telling. John loved to code, but not write papers. I drafted the first version of that paper while on vacation in Hawaii and John ended up a middle author. That was who he was. It was about getting it done, not about being academically rewarded. We later wrote a paper [2] when ontologies became the rage and we tried to convince folks that we had developed a rich and significant ontology. At least John was the first author of that.

John’s work defined what the RCSB PDB was under the hood when, under the leadership of Helen Berman, her group at Rutgers, Gary Gilliland’s group at NIST and mine in San Diego took over the PDB from Brookhaven National Laboratory. The first version comprised a Sybase relational database and a home grown object oriented database with a Perl CGI wrapper tying them together (those were the days).

With that and later enterprise frameworks written in Java, John, Helen TN Bhat at NIST and others set about a data cleanup operation that continues to this day. The ability to identify anomalies, enrich definitions and expand the scope of the science being described can be attributed to the flexibility of mmCIF and John’s vision for that representation. The early days were not easy. Software tools to work with mmCIF were few and far between and it was complex with limited documentation. It is a testament to John’s persistence, striving for perfection and gentlemanly approach that saw mmCIF become what it is today – the standard data definition that defines a field of study. 

John was the Bob Newhart of structural biology and structural bioinformatics. Mostly with an anguished  and deadpan look on his face he would come out with the driest humor that would have us all cracking up. Never was this so evident as when we had annual RCSB PDB team building exercises doing silly things with straws, bits of paper and the like. 

As structural biology became more complex in numbers of structures, size of structures, types of methods, global reach and general growth in the field, John was always there playing a vital role often from the backseat. Literally as evidenced by him and I sitting in the second to last row at the PDB 40th anniversary celebration [3]. It is fitting that finally in 2016 he received the inaugural biocuration career award from the International Society for Biocuration. Biocurators are the unsung heroes of much of what we do and John Westbrook is one of their leaders. May he rest in peace.

  1. I’m saddened to hear of John’s passing. I always enjoyed my interactions with him at conferences and RCSB events. Thanks for sharing your memories of him.

About Phil Bourne

Stephenson Founding Dean of the School of Data Science and Professor of Data Science & Biomedical Engineering, University of Virginia