Interview with Jaime Carbonell
Where are you from?
I was born in Uruguay in South America, and I grew up in Boston—specifically Lexington and Concord, both in the Western area of Boston. I went to school at MIT, so I lived in Cambridge for a while, then went to grad school at Yale, and I’ve been at Carnegie Mellon ever since.
How did you become interested in computer science? We noticed you studied math and physics as an undergraduate.
That’s a good question. When I was getting my undergraduate degree in math and physics, I was working on astrophysics, which required programming some early computers to do spectral analysis and other kinds of astronomical observations. I got very good at programming those computers in assembly language [laughs] to the point that I kept writing programs for other physicists and became the chief programmer instead of the chief physicist. So I decided to pursue what I was good at, and it was computer science after that.
How did you come about founding the Center for Machine Translation, and how did it eventually become the Language Technologies Institute?
So at Carnegie Mellon, there was actually an explicit effort to help internationalize the university in the 80s. Several efforts got started at the time in parallel, one of which was machine translation, because languages were of interest internationally. The field was also immature, so there was a significant opportunity to move it forward. We had connections with other universities and corporations in some European countries, Japan, Korea, and domestically, so when we started the Center for Machine Translation, we hired professors from all over, e.g. Russia, Japan, Germany, and the U.S. of course. That was a small factor that helped broaden Carnegie Mellon’s international recognition, permitting communication across language barriers, and it was pretty successful.
The Center for Machine Translation launched in 1986 but we had been working on it for two years before it was declared a center. At Carnegie Mellon, we also had worked on search engines and speech recognition. Before Google essentially took over, Lycos was the first large scale search engine, and Michael Mauldin, one of my first graduate students—whose nickname is “fuzzy”, by which everyone knew him, was the founder. Also in parallel, Raj Reddy had been building speech recognition systems, specifically, speech-to-text mapping. In the philosophy department, there was a program in computational linguistics, which taught the more theoretical side of structural language and semantic and syntactical analysis. Then in 1996, we put everything under one roof and that became the LTI.
What are your current research goals?
Recently, my own research goals have moved away from the main language technologies that I’ve spent my life doing. It’s in three areas now—one in proactive machine learning, another in computational structural biology, and the other in green energy, particularly wind energy optimization. Now those are my personal interests, which diverge from the main body of LTI, where all sorts of language work is still happening in machine translation, speech recognition, speech synthesis, and continuing search engines and computational extraction of information from large amounts of text. So the institute is still expanding from that. After working in that area for a long time, I decided to go off and try some new things.
The first area is proactive learning. The basic idea of machine learning is that you’re shown examples of a class, or an action, or a process, and the machine is supposed to learn characterizations of other examples of that class or how to conduct other behaviors equivalent to those actions. For example, if I show tables, like a kitchen table, this table, and that funny contraption over there [points to weirdly shaped table in office], can the machine recognize it as a table? And if I show it a chair, can it learn that it’s not a table? So object recognition is just one example. Another example is diseases. When diagnosing patients, can the machine learn if the disease is the same one as another patient’s disease, some other disease, or a new disease? Yet another example is the diagnosis of mechanical breakdowns. Can the machine figure out what the cause was so we can repair it immediately?
Proactive machine learning just means if the examples don’t come passively, how can the machine go out and get information? You can think of it as an intelligent student who knows what questions to ask. If the machine that was shown tables has never been shown an example of a rectangular table, it may want to ask me whether the rectangular table is a table or not. Essentially, the system has to know what it does and doesn’t know, and then ask questions about it to improve its learning. In the disease diagnosis case, it knows that certain symptoms map to this disease and that disease. So when it is analyzing symptoms, it’ll know if it’s not one disease, another disease, or a new disease with high certainty, then it will ask what kind of a disease is present. It’ll also know when to not bother asking. Irrelevant questions will not get it anywhere, e.g. if it knows it is not one disease, it will not ask more questions about it. So machine learning is basically trying to make the machine smarter in improving its rate of learning.
Let’s switch to structural computational biology. Gene sequences get transcribed from DNA to RNA, and then go to the ribosomes where they become proteins. Proteins are 3D structures of amino acids. The process of becoming 3D is called folding. Helices and sheets are formed, and so are barrels and some other irregular shapes. Without proteins, there would be no life as we know it. All your organs are made of proteins. Your kidney cleanses the blood for example, and this is done with proteins. It’s important to understand the structure of proteins in order to understand their functions and to design drugs that will interact with a protein, to replace or fix proteins, or even to stop a process. An example is the plaque protein in Alzheimer's disease, which gets in the way of neurotransmission in the brain. We could potentially design another protein to bind to the plaque protein to inhibit it. This would not be a cure, so the patient would have to be chronically on the drug, but this would still be good.
So to design new drugs, we need to understand protein structure. What we’re doing right now is inferring the structure of the given amino-acid sequence of the protein. It is very difficult to observe the structure of the protein; they are too small for electron microscopes to see. There’s a complicated method, which crystallizes the molecule and uses X-ray diffraction to view it, but this takes up to a year to identify one protein’s shape, and there are millions of proteins. Thus, inferring the structure of proteins using computational methods is necessary, even if it is only with 98% certainty. I’m oversimplifying the process, not even taking into account free energy calculations and so on. Proteins are regulated by their surrounding environment, so they may not even have the same structure given a different environment. The basic idea therefore is, given a sequence, what structure will it assume inside the body?
My third research area is the newest one, which I just started: Why don’t we have most energy from the wind and sun? The problem isn’t insufficient energy. There is enough wind in the U.S to generate ten times more than our energy needs—some people argue forty times. But the three main problems are the upfront costs, efficiency, and that the energy production is intermittent, as opposed to constant. Sometimes the wind blows, sometimes it doesn’t. Sometimes the sun shines, sometimes it doesn’t. Nuclear power plants can run continuously. So these are the three primary challenges.
The cost of production depends on the efficiency of these machines, and wind is more efficient than solar. This may not be forever since the photovoltaic cell receptors in solar panels may improve. Under current technology though, wind turbines, if placed correctly — that is, where the wind is, can generate electricity more cheaply than solar or wave energy or other renewables, but still not quite as cheaply as coal. So how do we get the world to convert to wind energy? Make it even cheaper and more reliable. One good thing about wind turbines is, if you build them right, they last a long time, 25 years, maybe up to 50 years—a lifetime. That’s less true of the big gas or coal power plants because you constantly need to renew due to burning things. Machines that take the sulfur out of coal (which is necessary so acid rain does not form) may break down and need to be fixed in order for the entire plant to run.
All of this costs a lot to produce renewable energy, but wind energy can be amortized (where you divide the cost over its lifetime). Computational models try to optimize the placement of the wind farms and the type of turbines and their integration into the electric power grid. For example, a taller turbine that’s further from the ground will get stronger wind, but will also cost more. So is it worthwhile to build a taller turbine or not? That’s an optimization problem. That has to be solved for the particular topography (ground configuration). If you’re at the top of a mountain ridge, the additional height is a small benefit, but it is hard to construct something on a mountain ridge, which requires anchoring the turbine into a rock. If you are in a valley, the additional height makes a larger difference, since the wind at ground level is not as strong, so we’ll have to make them taller. These are the kinds of tradeoffs optimizations.
Another question is, how far apart do the wind turbines need to be? They’re typically clustered because you buy a piece of land and put as many wind turbines on it as you can, as land or wind rights on land can be expensive. Now think about how the wind turbine works. Wind comes from one side, then it moves the turbine blades. Residual wind flows out the other side. If the prevailing wind direction is north to south in a particular place, then the turbines can be very close to each other in the east-west direction because they don’t interfere with one another. However, they need to be far apart in the north-south direction because the first turbine will have already used up most of the wind energy. To avoid this problem, you can also alternate between tall and short turbines to catch both the higher and lower altitude. Now suppose the wind is variable, rather than coming just from one direction. You can come up with a wind map that says 60% of the time, the wind is coming from north to south, 30% northeast to southwest, and 10% east to west. This now becomes a mathematical optimization problem. To see how far apart to place the turbines, you could place them densely along the east-west direction in this case, or far apart in all directions if the land is cheap enough. So we build mathematical models to answer these questions. If we can generate even 20% more energy by correct optimization, then the benefits increase by 20% for the same costs, which translates to 20% less cost per kilowatt-hour, which makes power cheaper, more competitive, and more likely to replace conventional energy like coal. That’s the game we play. There are other people on campus working on wind power, too. Jay Apt, a former astronaut, is one of them.
What is your role like as the Director of Language Technologies Institute?
I get to be the chief cook and bottle washer of LTI. Initially, after creating the field, I spent a lot of time hiring people. I still hire people; occasionally, we grow a little or replace someone who went away. Also, I play a big role in fundraising. We get foundations, government agencies, and companies to fund our research here. A role I play with other faculty is to do strategic planning—what research to do within LTI, other researchers to team up with, what research will be a hot topic in academia in the coming decade or so. We want to be ahead of the curve. Playing catch up is a bad idea; someone else will always trump you. Partly, it’s about having a crystal ball to predict which way we are going. Sometimes we are right, sometimes we are not, and sometimes we change the course midway.
Another part of LTI is the education. I established the Ph.D., masters, and undergraduate programs. Robert E. Frederking now heads our graduate educational programs. So those are my administrative duties. They take up about one third of my time when things are going well and two thirds of my time when things aren’t going so well. With the rest of my time, I mostly do the research I mentioned previously. I teach as well, usually only one course or seminar. Other people teach more here. I do the administration rather than the teaching. I also advise students; I have a large number. I have twelve students that I’m advising now, which is too many. When you advise a Ph.D., it’s a lot of one on one work and doing the research with them and so on.
How did you become an Allen Newell Professor?
Different people donate funds, and that funding creates a chair, meaning a small endowment. Money is paid to the professor who holds the chair to fund research. Every time there is a new chair, senior faculty and department heads and deans select a professor for the chair.
What are your experiences with industrial consulting? What kind of roles do you take on?
I am not involved with industrial consulting right now because I’ve been busy with research. I actually do fairly little industrial consulting. Instead, I help create a lot of spin-off companies. Other faculty members probably do more industrial consulting. I simply don’t have time right now. It’s not that I don’t enjoy it—it’s just that I have to prioritize.
So what are some of the start-up companies you’ve been involved with?
I’ve been involved with a lot of startups, some of them directly, some of them indirectly through my students.
My first significant start-up was Carnegie Group in the mid 1980s. The goal was to get artificial intelligence (AI) and expert systems for intelligent decision-making applied in industry. The company went public and was later bought by a bigger company. It donated enough money to CMU that there’s now a Carnegie Group chair, currently held by Mahadev Satyanarayanan. Carnegie Group was actually founded in conjunction with Raj Reddy and two other professors.
Since then, I’ve been involved with a few others, including Lycos. There, my role was an advisor. Michael Mauldin did most of the heavy lifting, and Lycos was even more successful than Carnegie Group. Newell Simon Hall was paid for almost entirely by Lycos money, which went to the university. Instead of a professorial chair, they built something more expensive. Mauldin didn’t want to name it after himself or Lycos though; he just had the auditorium named after him (the Mauldin Auditorium) and a coke machine. He drank a lot of coke; that was one of his favorite activities—drinking coke. We moved out of Newell Simon, so I don’t know what they did with the coke machine. It used to be in the lounge in the LTI when we were in the building last year. In fact, it might not even exist anymore, or it’s been taken over by current tenants.
Another company that I’m involved in now is called Carnegie Speech—it’s one of the coolest. It’s in the language education space and was founded by Maxine Eskenazi and me. Our first product teaches people how to pronounce English. It has speech recognition embedded into it. It’s fairly detailed and fine grained. It can tell you “the ‘R’ in the second word sounds like an ‘L’”, or “here’s an exercise to practice”, or “your intonation does not match the English question intonation”. You can also listen to a correct native speaker. It basically presents more exercises for you and measures whether you’re getting better or not. It could say, “Gee, you’re not really catching on—you’ll have to practice this more tomorrow. I’ll come up with more exercises for you.” Or, “enough already, you nailed this exercise, let’s move on.” It’s essentially a personal tutor, focusing on the spoken language because schools and universities don’t do a good job at it, for the simple reason that there’s one teacher for thirty students. The teacher can’t listen to every individual. Skills like vocabulary and grammar can be taught to the whole class, but individualized feedback on pronunciation is one on one. Human tutors are expensive, and one on one teaching is not very scalable, but the machine can reach out to everyone individually. Now we have products that teach languages other than English. Carnegie Speech also builds on listening comprehension and vocabulary now as well. The main difference between our program and others like Rosetta Stone is the pinpoint diagnosis and individualized feedback, where it listens to you carefully and closely. We have some customers right now in the US, India, Europe, but we’re trying to get the rest of the world to think it’s cool, too.
One application of this software is for international pilots. The “international” language is English, so we’re using this to teach them to speak aviation English correctly. One of the major causes of accidents is miscommunication, so better communication can help prevent these accidents. We just entered the market for that this year. I think it’s cool because I like teaching and I like safety. The product is doing both. If it can make money for the company at the same time, then it is even better. The university owns a part of the company. If it’s wildly successful, I guess we’ll see a Carnegie Speech building. If it’s moderately successful, I guess we’ll see a chair. Otherwise, I guess we won’t have anything.
What do you think is the most important skill you’ve learned?
One skill I use a lot is math. I use it in the optimization and modeling—mostly function approximations, statistics, and continuous math. Many people learn math and never use it again. If I didn’t know it, my career would not have worked at all. Another useful skill that I’ve acquired — unfortunately late in life, and I would have preferred to have learned it earlier — is to listen carefully to what other people are saying and what they really want. It’s not necessarily what they’re saying but what they’re implying. I learned to really infer what the underlying challenge is by asking probing questions. That did not come naturally, and it isn’t something that can be easily taught. It’s learned through practice.
What kind of technology do you think will be most prominent in ten years?
Several technologies. One is miniaturization of what occurs in large machines into handheld ones. For example, your iPhone will be able to translate languages and receive voice commands. (Not just the iPhone of course; just using an example of a hand device with power.) Basically, we’ll be able to access information and issue commands via language and speech, hands-free while driving or working. Computing will become ubiquitous as opposed to being confined to stationary objects like these machines [points to desktop in the office]. That trend is already solidly on its way, not just speculation.
What are your hobbies?
I play chess. I used to do long distance bicycle riding. I do less of it now, but I should do more. I also read science fiction. Some of my favorite authors are Robert Heinlein and Isaac Asimov—the classic ones. Many of the modern ones are good too. Oh, I left out Arthur C. Clarke.
What are some cool places you’ve traveled to that you’ve really enjoyed?
Recently, I traveled to Egypt. I’d love to see not just the pyramids, but also all the ancient sites and tombs. I also go back to Uruguay once every several years. At one time, it was a dictatorship, and it was more problematic to go back (when I was younger). But that ended and now it’s a normal democracy, so it’s no longer a problem.
Working in language technologies, have you had to learn many languages? How many languages do you speak?
I speak Spanish, French, and English fluently. I sort of speak other languages like Italian and Portuguese—a little bit of German, too. I have studied other languages, like Japanese, but I don’t speak them. If you learn one Romance language, it’s easy to pick up other ones. English less so, but in general, there’s a lot of Latin in its structure. Picking up Japanese would’ve required more effort than I was willing to invest. Watashi wa nihongo wo hanasemasen. That means “Sorry, I don’t speak Japanese” in Japanese by the way.
I’ve taken Latin in high school. I proceeded to forget it later, but I guess I can still read it. I never did learn ancient Greek though—just Latin.
Favorite food or type of food?
My problem is that all foods are my favorite! In fact, I need to come up with a way of cutting that. It’s really true. I like to travel and try local cuisine. I’ve been to India, specifically southern India, and I loved to try all the different kinds of food there. Some of it was too spicy for me. I’ve also been to Korea. Interestingly enough, the small towns in Korea have slightly different food than what we get in restaurants. I love the geography of Korea—the mountains and lakes. I’ll be going back there in two weeks for a business trip.