Interview with Robert F. Murphy

Robert F. Murphy

Meet Robert F. Murphy, Ph.D.
Ray and Stephanie Lane Professor of Computational Biology

Ray and Stephanie Lane Center for Computational Biology

Interview by Linda Cai (senior, CSD) and Lucy Li (junior, CSD)

Where are you from?

I was born in Brooklyn, New York, and grew up on Long Island except for a few years in rural Maryland.  I went to college in Manhattan at Columbia, so I was in New York for most of my life until I was an adult, and then I went to Caltech for graduate school in 1974. The move to California involved great changes in climate, people, and everything.  It was at Caltech that I got involved with computing. I was biochemistry major at Columbia, and I wanted more rigorous, more quantitative analyses of the experiments that I was doing.  I learned computing at the computer center, punching data onto cards to run jobs on their IBM mainframe.  My advisor bought an early PDP-11 minicomputer, and I spent a lot of time interfacing instruments, writing device drivers and analysis programs.  From that point forward, I pretty much have always had some fraction of myself doing computing and some fraction of myself doing biology.

When I first came to CMU, I had a small group of computer programmers and another of biologists. The two groups didn’t speak much to each other. About 15 years ago, I decided to make a change and unify the group. I basically only took people in my group then that knew or were willing to learning both, who were willing to understood at a fundamental level both the biological problem and the computational methods.

This was because I came to a view that having strong foundations in both areas was going to be critical to progress.  The main way in which the field of computational biology had grown was CS people learning what problems to solve by talking to biologists.  You had this fairly inefficient exchange of questions and methods, which is typically the way that any interdisciplinary field begins to arise. The field had grown enough that it was clear that the people who were strong in both were in a better position to innovate. That’s when we started some of our efforts towards training students in computational biology. Our initial efforts were at the undergraduate level, but this eventually led to our current computational biology Ph. D. program.

What parts of computational biology are you specifically involved in or have the most interest in?

I can answer that both from the perspective of my research group and as the director of the Lane Center. My own research group is primarily concerned with the simple question: How can we build systems that are capable of learning where all proteins are located within all cell types under all conditions? The number of combinations of proteins, cell types, locations, and conditions is so high that we need an automated approach to conduct research; it is too high to do them all manually. The typical approach is to vary one of those variables at a time and see what happens. It’s my belief that that won’t succeed, especially because all of those variables can interact. The automated systems we seek are not just the automated execution of the experiment, but also, automated learning, where a computer identifies combinations of proteins and conditions to focus on or ignore.  That is the fundamental problem that that we’re facing that is leading us to use and develop active learning methods, where the learner has control over the data that is being fed to it, contrary to traditional machine learning.

A traditional machine learning application is given some data, and does some analysis on that data (for example, finds clusters in the data). The paradigm behind active learning is when it is not feasible for one reason or another to measure all of the variables defining the space that you are interested in learning, that active learning can decide which of those to actually measure.  Companies like Netflix use systems like this.  In the case of Netflix I’m sure they’d be much happier if the user rated every movie; then they’d have a very good model of what you like and what you don’t like. However, they recognize that rating all movies is too taxing for a user, and they can only get you to answer so many times. Instead, they’re controlling the data acquisition, building a model of your tastes based on what you have seen by asking a few well-chosen questions every so often. They’ll give suggestions to you and ask you whether you liked it. So that’s an example of active learning.

The underlying insight behind systems like that one is there will be correlations. If you like movies by one director, you’re likely to like movies of directors with similar styles. If there were no correlations, they wouldn’t be able to make any predictions and wouldn’t be able to tell what you like and don’t like.

In my group, we are using active learning to acquire and analyze microscope images. We figure out where a protein is located in a particular cell, depending on cell type and other conditions. Stimulating those kinds of applications of active learning is one of the major missions of the Lane Center.

There has been a paradigm shift in biology in the last 25 years or so. The primary paradigm had been Reductionist, meaning that in order to understand a complex system like a person,  you should try to take it apart, and learn how parts function independently; then you’ll be able to put the parts together and know how the system functions at a whole. A great example of this is a story my Dad told when I was growing up. When he was a teen, his dad brought home a car that didn’t work, put it in the driveway, and said, “If you fix it you can drive it.”  My dad knew nothing about how cars worked, but cars had modular systems. When you took the wheels off you could figure out how the brakes worked or how the steering worked, without knowing yet how the engine worked. Eventually, you could easily figure out how the whole car worked.  The parts didn’t work in pure isolation, but had a small number of interactions, so that you could still learn from the isolated modules.

What happened in the last 20 or so years is the emergence of the idea that biological organisms are not like that; it’s like the brakes are connected to the carburetor and the carburetor opens the door, or whatnot. That gave rise to the systems biology paradigm. It isn’t sufficient to look at things in modularized components; we have to look at the whole system and understand the properties emerging from the whole system. There is a lot of modeling and computation associated with systems biology. These are complicated systems – they are very big.  The nature of experimentation changed, so instead of measuring small numbers of things, we measure a large number of things, like a whole genome or set of proteins. A lot of work is being done developing approaches to studying thousands, tens of thousands, of proteins or genes, all at once. However, fundamentally it does not address the issue of what conditions, what cell types, what tissues, and what individual to study. The fundamental question becomes: How can we build a model of a complex system when we can look at a relatively small number of conditions/experiments?

The motivation behind the Lane Center was to apply machine learning methods to very thoroughly understand complex systems, and to use active learning methods which automate some of the model creation and to control the next set of experiments.

There are some very interesting computational challenges associated with that in terms of making learning systems for complex systems like people, in terms of discovering where to expand experimentation.

The Lane Center seeks to be catalytic: we’re not going to be able to do enough experiments here at Carnegie Mellon to building entire models of the human beings.  Instead we are developing methodologies that can be used by scientists around the world to collaborate in building them.  Some other interesting questions arise, assuming we are successful at building such models. Issues that aren’t computational or biological anymore arise: How do scientists and society react to systems that have a greater level of understanding than scientists themselves?

That’s essentially what we end up with. We’ll have a learning system that has taught itself what is important that is going on in a person, and yet scientists and physicians will not really have the entirety of the understanding themselves. That understanding will live inside of the machine. A very interesting question is: What are the implications, for science and for medicine? Human judgment has always been at the forefront, or a critical component in those fields. What happens when a machine has a deeper understanding than people have?  We’re also interested in that question.

Please describe your role in being the Director of the Lane Center for Computational Biology.

My role is to coordinate the efforts of existing faculty, postdocs and students, and recruit new people to achieve our mission involving machine learning and biological systems.

From a research perspective, we have some very talented faculty here – computational biologists, biologists, engineers and computer scientists. I think the Lane Center also has an important role to play from an educational perspective. Our existing faculty help greatly in training grad students and Lane fellows to learn the approaches that we’re taking here so that they can bring them out to rest of the world. We play a leadership role in computational biology education in the undergrad and grad level.

The field is interdisciplinary, and what is being studied can be complex. How do you keep up with all that you need to know or be proficient in, for your projects?

The short answer is that I rely heavily on my students, post-docs and fellow faculty. I obviously go to a number of conferences where cutting edge research in this field is discussed. Keeping abreast of what is current is nowhere near the challenge it was when I was a grad student, because the internet has made a lot of this is much easier than it used to be. It’s easier, but still a major challenge.

I should stress what a wonderful environment CMU has been for me, as a locus for the efforts we’ve been trying to do in computational biology and as a source of great ideas on cutting edge approaches. We’ve received tremendous support from all of the departments involved. This is especially true for the Biological Sciences department in the Mellon College of Science, which initiated the University’s focus on computational biology.  We received support especially from Elizabeth Jones, who was the department head, Richard D. McCullough, who was the dean in the early 2000s, and the current dean, Fred Gilman. On the SCS side, for me, the Machine Learning department head Tom Mitchell has been a tremendous colleague, involving my group in what was the Center for Automated Learning and Discovery (now the Machine Learning Department), working as a great partner in recruiting faculty and attracting research funding.

In the Carnegie Institute of Technology, especially in the biological engineering department,  there are people like Jelena Kovacevic, with whom I founded the Center for Bioimage Informatics, which researches automated image analysis and is a major home for that type of work here at CMU.

The Lane Center becoming the latest academic unit within SCS, which just happened last week, is something that we’re very excited about. I’ve been very grateful for the support that we’ve gotten from the wider SCS community, especially the dean, Randy Bryant, and from the provost and president as well.

How is research like yours going to change the world in the coming years?

There are a couple of answers. Many of the machine learning methods that we’re using now have immediate applications, whether it’s learning to recognize when a particular tissue is becoming cancerous by virtue of its structure or expression changes, or whether it’s building models that correlate one protein’s variables to another’s, thereby enabling one to predict the change in protein behaviors. All of these can have immediate impact. That’s where most of computational biology is today. We have a number of people here today working on these methods, and we are at the cutting edge.

Longer term changes are going to come from the active learning focus. It’ll greatly change how biology and medicine are done. We’ll see about how long it takes.

An idea pioneered by Lee Hood at the Institute for Systems Biology in Seattle, is to have highly customized therapy for individuals. They’ve been pushing for personalized and predictive models that would, based on an individual’s genome and history, predict their future medical conditions. Lee Hood was one of first people to push for this notion of personal models customized for individuals.  That is the kind of change that the whole field of systems biology is working toward. We see a role for the Lane Center in enabling the creation of those models. We feel that it won’t be possible without active learning.

Well, when you talk about “customized treatment,” it sounds just like regular medicine. I mean, if you have an infection, you’re given a particular medicine, and otherwise not, right? But personalization goes past that. Perhaps two people have the same disease, but based on genetics, they can be given different treatments. This is a very simple example, but it applies the principle that any treatment might need to be customized depending on the makeup of a particular person. That’s the long term vision of the systems biology field, and our role is in helping to achieve that.

What was your experience in Germany?

I had the opportunity to go to Freiburg, a very nice city in southwestern Germany, on the edge of the Black Forest. It’s primarily a university town, and absolutely a beautiful city situated on a small river, surrounded on a couple of sides by hills and mountains. I had an opportunity to go there on an Alexander von Humboldt Award, an award for senior scientists to spend time in Germany. I stayed for five months, from late spring to early fall of 2008. I also became affiliated with the Freiberg Institute for Advanced Studies, a new interdisciplinary school that was created through the University of Freiberg. I’m an external senior fellow of that institute and I’ve been going back intermittently to collaborate with colleagues and partake in events. One of the first attempts that we’ve made to validate active learning methods for experimental biology was done this past summer with colleagues in Freiburg. It’s been a very good experience being there for me and my family, because it gives the opportunity to see a different way of life, different culture, and a society that takes ecological responsibilities, education, and research very seriously.

What was it like to develop the first formal undergraduate program in computational biology? How did you see the need for it?

A part of that was coming out of my own experience. From when I was 13, I wanted to do biological research. The path to doing that involved this sort of accidental discovery of computer science. I was fortunate in that I found out about computer science early enough that I was able to keep up with the developments occurring in the past 35 years while they were happening. I want to provide that kind of grounding in both computer science and in biology that is necessary to move the field forward. It’s like the founding of the field of biochemistry 50 years ago. There was a biology department and a chemistry department. They didn’t really speak much to each other. Biologists didn’t deal very much with the molecular aspects of biological systems. That has obviously changed, and field of biochemistry arose out of people who learned both biology and chemistry. It’s the same for computers and biology. We learn about computational approaches that are necessary to research biological systems.

We started that undergrad program hoping to give an opportunity to those who wanted that mixture. I think the right place to start is at the college level. There were many that started at the grad level.  My feeling is that it’s harder to start as late as the graduate level. It’s beneficial to have that deep training from the beginning. That’s the premise of the undergrad program.

I got involved in doing it at the same time that the dean of the Mellon College of Science, Bob Sekerka, wanted to create computational programs in each of the school’s departments of Mellon College.

 

What are your thoughts on the Lane Center becoming part of the School of Computer Science?

Computational biology has gone through a number of transitions and has been growing. The model that we’ve been using to date has been a traditional center type model where you bring in people from different departments and they work together. We’ve been happy with that. One of the motivating factors behind having the Lane Center becoming a department was that faculty who are considered computational biologists and who work in that area are best recruited and evaluated by other computational biologists. It’s the reason why we have departments in any area. You view that department as having a particular expertise, which is embodied in its faculty, and faculty evaluate each other in those terms. With department status, we have the ability to recruit faculty for appointments directly within the Lane Center. I’m also very pleased because the Lane Center becoming a department within SCS represents a statement by the university, of how seriously they take computational biology. So I’m very excited about it for those reasons.

What were some decisions you made regarding the development of the Gates-Hillman Complex for the Lane Center floor?

We didn’t make many decisions. There was a very complex process of which space was allocated to the different departments. We were allocated a certain amount of space and we figured out how to use it. We’re very excited to be in the Gates Hillman Center.

What are your hobbies and passions outside of computational biology?

Well, I like to bike a lot, especially when I’m in Freiburg, where it’s flatter. I play basketball a few times a week, which is how I managed to get this (points to black eye).  I enjoy various board games and card games with my family.

What interesting college experiences did you have as an undergraduate at Columbia?

I very much enjoyed my undergrad education. I guess one of the most interesting things was that I went to a dinner where the guest of honor was Arthur Miller, a playwright, and it was the first dinners I went to in college. I was a John Jay scholar, a program where they had these dinners for the scholars. I was looking forward to this dinner tremendously, and I had all sorts of questions that I wanted to ask Mr. Miller. Then they brought out a squab (a type of pigeon) as the main course, and I had never seen anything like that.  I was so distracted by my attempts on disassembling this squab that I never managed to ask any of my questions. I was worried about embarrassing myself by committing some faux pas.

Do you have any advice for students, whether undergraduate or graduate?

It’s not particularly novel advice. Take advantage of everything, even if you don’t see a way it’s going to necessarily impact your future. The opportunity to learn things in different areas while as a student creates the capabilities that you will draw on when solving problems in the future. Being able to recognize when something is an important problem, and when a problem can be addressed, may come up later in your work, or your life in a way you didn’t anticipate. It’s a fairly common bit of advice but I do firmly believe it. Take advantage of all types of educational opportunities, and ensure you achieve a broad education, as well as deep.

Any favorite quotes?

Andre Malraux –
The greatest mystery is not that we have been flung at random between the profusion of matter and of the stars, but that within this prison we can draw from ourselves images powerful enough to deny our nothingness.