An Interview with C++ Creator Bjarne Stroustrup

by Danny Kalev

Bjarne Stroustrup talks about the imminent C++0x standard and the forthcoming features it brings, the difficulties of standardizing programming languages in general, the calculated risks that the standards committee can afford to take with new features, and even his own New Year's resolutions.

With the new C++0x standardization process about to have the final technical vote we sit down with C++ creator Bjarne Stroustrup to talk about C++0x, new features and future plans.

Danny Kalev: Where is the C++0x standardization process standing these days? How close are we to a new International Standard?

Bjarne Stroustrup: The final technical vote is scheduled for March 26, 2011. I see no reason for that to fail. After that, there are formal national ballots and ISO bureaucratic delays, but I'm pretty confident that the official standard will be available in 2011.

Danny Kalev: In terms of core features and library enhancements, which good tidings does the C++0x bring to the typical C++ programmer? Which aspects of C++0x make you particularly proud? Finally, what are your recommended techniques for learning the new features of C++0x, considering that C++0x textbooks are still a rarity?

Bjarne Stroustrup: The improvements to C++ come in the form of many small and incremental improvements, rather than a few "marquee" features. I guess some of the improvements won't seem minor to many, but that's even better and don't detract from my key point: The C++0x improvements are pervasive; they help in many places in many kinds of code, rather than being isolated to a few new components. The way I think of it is that I'm getting a lot of new kinds of "bricks" to form my "building set" so that I can build many things that I couldn't easily build before, and far more easily and elegantly. In fact, it is hard for me to think of any program that I can't write a bit easier and more elegantly in C++0x than in C++98 - and typically have it perform better also.

Let me first mention a couple of near-trivial changes that make life just a bit easier for the C++ Programmer. Consider:

		  void f(vector<pair<string,int>>& vp) 
		      struct is_key {
		          string s;
		          bool operator()(const pair<string,int>&p)
		              { return p.first == s; }
		      auto p = find_if(vp.begin(), vp.end(), is_key{"simple"});
		      // …

This doesn't look particularly new or exciting, but I count four little "things" that are not C++98:

  1. There isn't a space between > and > in vector<pair<string,int>>. I rate this the smallest improvement in C++0x; it saves the programmer from having to add that "spurious" space (learn more about this issue here).
  2. I declared p without explicitly mentioning the type of p. Instead, I used auto, which means "use the type of the initializer." So p is a vector<pair<string,int>>::iterator. That saves a fair bit of typing and removes the possibility of a few kinds of bugs. This is the oldest C++0x feature; I implemented it in 1983, but was forced to take it out for C compatibility reasons.
  3. The local struct is_key is used as a template argument type. Maybe you didn't even notice, but that was not allowed in C++98.
  4. Finally, I made a key using the initializer {"simple"}. In C++98, that could only be done for a variable and not as a function argument. C++0x offers uniform and universal initialization using the {...} notation.

We might simplify this example further by using a lambda expression:

		  void f(vector<pair<string,int>>& vp) 
		  auto p = find_if(vp.begin(), vp.end(),
		      []()(const pair<string,int>&p) { return p.first=="simple"; });
		  // …

The lambda notation is an abbreviation for a function object definition and use. Here, we simply say that the predicate required by find_if() takes pair argument and compares its first element to "simple".

OK, such minor improvements are all very good, but what about more major issues?

  • Direct and type safe support for the traditional threads-and-locks style of system-level concurrency. Together with a detailed memory model and facilities for lock-free programming, this provides support for portable and efficient concurrency.
  • A higher-level concurrency model based on asynchronously launched "tasks" communicating through message buffers called futures.
  • A regular expression standard library component
  • Hashed containers
  • Move semantics and the use of move semantics in the standard library. In particularly, we can now return large objects from functions by value. For example:

      vector<int> make_vec(int n)
          vector<int> res;
          for (int i=0; i<n; ++i) res[i] = rand_int(0,100000);
          return res;

    The standard library's vector has a "move constructor" that simply transfers the representation of the vector representation rather than copying all the elements. This implies that the return is accomplished by something like six assignments (rather than, say, a million), so that we don't have to fiddle with pointers, references, free store allocation and deallocation, etc. This provides a whole new paradigm for passing large objects. In particular, it makes it trivial to implement and use value producing operations that need to be efficient, such as a matrix add:

      Matrix operator*(const Matrix&, const Matrix&);

I could go on for a while, but long lists get tedious and anyway, this leads us to the second and more interesting part of the question: How will people learn to use all of this well? I'm working on a 4th edition of The C++ Programming Language, but that's a huge amount of work. It will take at least another year. I'm sure other authors are writing or thinking about starting to write, but it will be a while before we have quality books and teaching materials for experts and novices. We are lucky to have a good early source of information related to concurrency: Anthony Williams: C++ Concurrency in Action - Practical Multithreading. Another early source is my C++0x FAQ. That FAQ presents short examples of uses of most C++0x language features and standard-library facilities and references to currently available material. However, we need more than FAQs and online documentation. We need coherent explanations of how to use the facilities in support of good programming. For that, we need textbooks.

When I wrote Programming: Principles and Practice using C++, I was simultaneously working on C++0x and it was painful not to be able to use C++0x features. I confidently predict that C++0x will be a boon to education/learning. The support for good programming techniques and styles is so much better now. For example, we now have a uniform and universal mechanism for initialization. We can use the {...} notation for every initialization and wherever we initialize an X with {v} we get the same resulting value. That's a major improvement over C++98's non-uniform set of alternatives using the =v, ={v}, and (v) notations:

  vector<double> v = { 1,2,3,4};	// a user-defined type
  double a[] = { 1,2,3,4};		// an aggregate
  int f(const vector<double>&);
  int x = f({1,2,3,4});
  auto p = new vector<double>{1,2,3,4};
  struct S { double a, b; };
  S s1{1,2};			// has no constructor
  complex<double> z { 1,2,};	// has constructor

Before we get to benefit from the simplifications offered by C++0x, we may go through a period where too many people try to show off their cleverness by enumerating language rules and digging into the most obscure corners. That can do harm.

We cannot expect people to gain a good understanding of C++0x programming just from reading. People have to actually use the new features. Given that, it is good that implementations of many of the C++0x features are currently shipping (e.g. in GCC and Microsoft C++) as are essentially all of the new standard-library components. C++0x is not Science Fiction!

Danny Kalev: The C++0x standard is three years behind schedule. The Perl 6 and Java SE 7 releases also exhibited significant delays. It seems that standardization projects these days take much longer than before. Is it something inherent to the programming languages of the 21st century? Is it the standardization process itself? Or perhaps this has always been the case?

Bjarne Stroustrup: The languages themselves are larger than programming languages used to be; the libraries that come with them even more so. Also, the code bases that must not be broken are huge. On top of that, the cost/time of dealing with complexity does not go up linearly with size. I suspect that the difficulty goes up at least quadratically because of the need to consider all possible interactions among language and standard library features.

The nature of the standards processes have changed to take that into account. Furthermore, the use of online communications has increased the number and variety of people who can take part. Two decades ago, the number of people who could take part was basically limited to those with major corporate or university funding and with time to spend weeks at meetings and months at home preparing. My guess is that between 100 and 200 people had an active role in shaping C++98. Maybe four times as many people took similar active roles in the C++0x effort. The extra manpower is most welcome and the work of so many people is essential for ironing out problems. However, it is always easier to prevent a change than to make one, the larger group is less homogeneous in aims and backgrounds, and the trust necessary for progress is harder to establish in a large group and over the web than in a smaller group repeatedly meeting face to face. Also, fewer people understand or care for the whole language and the whole user community - it is always easier to focus on a small subset and there is a danger of dismissing the concerns of communities you don't know as irrelevant or even "simply wrong."

The sum of these two effects is an increasingly conservative bias. Maybe I should be amazed that we achieved as much as we did. Getting an idea (a "vision," if you must) through the technical and philosophical obstacles in the committee can be infuriatingly slow and difficult. In particular, FUD is a powerful weapon because it is essentially impossible to prove that a new feature will provide significant benefits when deployed at scale, whereas it is usually easy to imagine potentially serious problems.

Related Articles

Danny Kalev: You have been a professor at Texas A&M University for several years now. Which new insights and lessons regarding the design of programming languages has the communication with fresh CS students brought to you?

Bjarne Stroustrup : Not as much as I had hoped. Students generally have a somewhat shallow view of programming and programming languages. Bringing students (both undergraduates and graduates) up to date with modern problems and techniques is hard and time consuming. Also, academic language research tends to be rather far removed from practical systems building. I guess that the main effect of my academic and educational work has been an increased appreciation of the needs and limitations of novices (of all backgrounds). You can't just expect them to learn all they need before starting to program; instead, you have to work on giving them sufficient help so that they can manage until they learn more. It is important that the initial "sub-languages" that a novice can master in a reasonable time (weeks or months, rather than years) don't trap the students into a 1970s or a 1980s view of programming.

One solid practical result of my work as a teacher is my book for beginners: Programming: Principles and Practice using C++. That book grew from materials I and my colleagues used teaching a couple of thousand of students over several years. I consider it a far better introduction to real-world programming than conventional C++ or programming textbooks. Programming is a high-level skill involving both fundamental principles and practical techniques - and should be taught as such: Neither as just a set of low-level practical skills or as just a theoretical framework.

Danny Kalev: Back to the thorny issue of concepts. In our previous interview, you referred to the removal of concepts (in July 2009) as a "big deal". Are you still researching for an alternative type-constraining mechanism for templates? In hindsight, would you design concepts differently today?

Bjarne Stroustrup: Yes, I would like to get something for precisely specifying the requirements of templates. I still think that this would be among the greatest improvements we could do to the C++ language. However, any such "concepts" would have to scale to industrial use. They would have to be usable by millions of programmers (not just programming language specialists), not impose run-time or compile-time overheads (even in huge separately compiled programs), and be sufficiently compatible to be useful in programs mixing new-style and old-style templates. The design we had for C++0x didn't meet those stringent criteria. Neither do the various schemes currently deployed in other (research or production) languages that I know of.Yes, I have some ideas, but it will be much more work before I could claim to have something that is close to the criteria from the previous paragraph. Some of the work is at the fundamental, conceptual, research level; other parts of the work are detailed engineering of user interfaces and compilation models. A minor adjustment to what we had for C++0x is not enough.

Danny Kalev: Following the introduction of rvalue references, the value system of C++0x now includes prvalues, xvalues, glvalues etc. Additionally, C++0x has new overload resolution rules, new template deduction rules, new cast rules, two new special member functions: a move constructor and a move assignment operator, and as a side-effect of the latter -- noexcept. Aren't rvalue references becoming too pervasive and perhaps out of hand? Was this foreseen when rvalue references were first added to the Working Draft?

Bjarne Stroustrup: It is rare for someone to foresee every implication of an improvement and hardly anyone ever looks at every detail in the standard. I think that if you list every detail of even the most widely used and familiar feature, you will have people recoil in horror and surprise. That's basically the case for every non-toy language. The specification necessary for making implementers agree is amazingly detailed and complicated. Languages with a single implementation appear simpler because people basically don't look at all the details and don't precisely outline what each part of the implementation is supposed to do - the implementation simply says what it does.

My impression is that rvalue references were meant to become pervasive because move semantics must be pervasive to be of major use. I don't have much problem explaining move semantics. Explaining the other main use of rvalue references, perfect argument forwarding, is a bit harder because the key examples are tied up with template metaprogramming, which many people consider too complicated by itself. The fact that the perfect forwarding offered by rvalue references simplifies template metaprogramming is lost on those who have already given up on template metaprogramming.

Precisely specifying every detail and every possible interaction with the rest of the language and standard library is much harder. To do so, we had to invent some new terminology, but I expect that extremely few programmers will have to worry about or even know about that. Many people were shaking their heads when Christopher Strachey invented "lvalue" and "rvalue" to describe C++'s ancestor CPL, thinking that the need to mess with the English language proved that complexity had gotten out of hand.

I consider noexcept a separate issue from rvalue references; I see noexcept primarily as a simpler and workable alternative to the failed exception specifications.

Danny Kalev: So overall, do you consider rvalue references a worthwhile addition to C++0x? What should the typical C++ user expect to get from this feature other than performance improvements, e.g., cleaner designs, simpler algorithms etc. -- or is it a feature that will mostly affect library designers and compiler writers?

Bjarne Stroustrup: I see rvalue references as more than just worthwhile; I see move semantics - which is one of the two major use cases for rvalue references - as a solution to a long-standing problem: How do we return a large data structure from a function? Move semantics gives us the "simple, obvious, and efficient" answer: just move the result out of the function; don't copy the result, don't fiddle around with fancy memory management schemes, don't use messy special-purpose memory management schemes, don't require the caller to preallocate memory, don't require values to be passed using "extra arguments," don't require some form of garbage collection. This will affect everyone - for the better. Move semantics simply eliminates the need for a lot of cleverness.

Please note that writing a move operation is typically trivial. This is not rocket science:

  class Matrix {
      double* elem;	// pointer to elements
      int dim1, dim2;
      Matrix(Matrix&& a)
                :dim1(a.dim1), dim2(a.dim2), elem(a.elem)
          { a.dim1=0; a.dim2=0; a.elem=nullptr; }
      // ….

That's it: move the value and leave an empty Matrix behind. Given that, you can return 10000-by-10000 matrices simply and efficiently.

Naturally, library writers will also have a field day - there are so many cases where move can be used to simplify library implementations and probably many more where forwarding (the other main use case for rvalue references) help in library design and implementation, but move semantics and perfect forwarding are not esoteric "expert only" techniques.

Danny Kalev: Although rvalue references were first proposed in 2002, the FCD still doesn't address fundamental aspects of move semantic such as the state of a moved-from object (this is needed to enable compiler-generated move constructors, optimize certain algorithms etc.) What are the contention-points regarding the state of a moved-from object? How close are we to a consensus, if any?

Bjarne Stroustrup: I think we have a consensus (post-Fermilab-meeting, November 2010). The votes bear out that view. We have had lively discussions but all significant issues are settled. The state of a moved-from object is settled: by default it is what you get from memberwise moves; if you define your own move, you should leave the moved-from object in a valid-but-unspecified state (exactly as if you were about to throw an exception, and for the same reasons). The rules for generation of constructors and assignments have been tightened to minimize surprises.

Consensus means a large majority, not unanimity. There will always be some who prefer a different solution and sometimes they express themselves with less constraint than one might prefer (especially in the heat of an argument before an issue is settled), but I think that the standard has converged to something everyone can live with and most committee members - if not all - are convinced that the current draft standard is better than essentially all feasible alternatives. Reasonable people understand that not all proposals can be accepted. Even I don't get all that I want - not anywhere near! What matters most is that C++0x is a much better tool for software development than C++98.

Danny Kalev: In the past few years, you have supported several features that aim to improve performance: SCARY iterators, noexcept, and implicitly-defined move constructors. Some committee members were opposed to these proposals because they feared that they might compromise code security and prolong the C++0x standardization process. Is performance still so important these days? How much risk should the committee take with features that haven't been fully tested before?

Bjarne Stroustrup: Certainly performance is important to C++. C++ is disproportionally used where performance really matters. It is performance (as well as flexibility) that gives C++ such a massive presence in high-end systems (e.g. Google, Amazon, and Amadeus), in embedded systems (e.g., cell-phones, cars, and planes), and browsers and virtual machines (e.g. Chrome, V8, Havok, and Hotspot).

There seem to be standard ways to oppose anything new. Fear of security problems is part of that arsenal, but I see nothing in the proposals that I have supported that worsens the problems of C++ vis a vis security. Of the features you mention, SCARY has no security implications whatsoever - it is simply a technique for implementing algorithms with less overhead than is possible with over-constrained nested types. In fact, it is one of the two major current ways of implementing iterators for standard containers - the proposal was simply to require the more efficient and more flexible alternative to be consistently used by implementers. Even then, SCARY failed because many felt, at least in this case, that the committee should not limits the choices traditionally available to implementers. However, I expect that the data is sufficiently convincing that soon every standard library implementation will support SCARY.

noexcept is if anything safer that its C++98 alternatives (use of exception specifications or home-brew error-handling schemes) as well as faster.

"How much risk should the committee take" is a big and difficult question as is "what kind of risks should the committee take?" Essentially every new feature or library implies some risk: it could be risk of breakage of existing code, risk of confusion, risk of damaging a business by eliminating the need of a product, risk of breakage of tools, risk of invalidating educational materials, etc. Making no changes would seem to be 100% safe, but would lead to a stale user community, inferior support for newer programming techniques, and eventually lead to billions of lines of code being rewritten at enormous cost. Doing nothing is not risk free. My opinion is that doing nothing guarantees failure in the medium term, implying that to minimize risk, we must take many calculated risks.

I don't think that any significant feature can be completely tested - that would involve deploying it for a few years to a diverse community of thousands. Such an "experiment" would look more like an attempt to force a dialect on the community as a whole or as an attempt to create a lock-in mechanism. We have to make do with smaller experiments, consideration of many alternatives, lots of discussion, and work on the standard text.

I think that much debate about risk is misdirected. People mostly worry about breakage of old code and implementability. Typically the greater danger is poor design. It is really hard to ensure that a new feature is sufficiently general and sufficiently easy to teach, learn, and use. Fear of novelty often leads to overly timid language extensions with overly elaborate syntax and/or semantics. We need relatively more discussion about design and more work on use cases, but most people seem to be more comfortable discussing technical details and implementation techniques.

Danny Kalev: Let me pin down the issue of calculated risks: should there be a different standardization process for isolated features of minimal interaction with other features (say class member initializers), as opposed to pervasive features that affect almost every aspect of C++? Allegedly, the former can be excised from the language rather easily should they fail (e.g., exported templates) whereas the latter are less flexible -- every small modification affects the entire language, libraries etc. Or is it too naive to assume that such a division is possible?

Bjarne Stroustrup: It's hard to pin down risk. People really do differ in what they consider to be a risk and in their tolerance of risk.

It is hard to cleanly separate features into "localized" and "pervasive." The ideal is for a feature to interact cleanly with essentially every other feature in the language. In many cases, the alternative is duplication of functionality. Ideally, we have only one notion of name lookup, only one notion of expression, only one notion of scope, only one notion of initialization. Whenever we succeed at this ideal, we get a minimal language and any change to that feature potentially affects every other feature in the language.

It is rather similar for standard libraries: Ideally the standard containers should be used by other standard library components to hold data and the standard algorithms should be used in the implementations of other library components. When we see alternative container libraries used or libraries managing data through direct use of free store and pointer manipulation we are seeing forms of failure.

So, my conclusion is that you can minimize risk relative to a language feature or standard-library component only by not building upon it within the language or standard-library. However, by doing so you maximize replication/redundancy in the standard, maximize the size of the standard, and could come close to maximizing the implementation and learning effort. To use a different terminology, we want the standard to be strongly cohesive and that makes a loose coupling of different parts of the standards process essentially impossible. In particular, in the standards committee, we always need more people who care for the whole language (rather than just the subset used by their favorite developer community) and who understand the whole language (at some suitable depth).

Danny Kalev: Let's talk about noexcept, a feature that is in the FCD. C++ programmers might wonder why noexcept is needed when there's already a throw() specification to designate a function that shouldn't throw (let's ignore for a moment the recent deprecation of exception specifications). What is the main advantage of noexcept over throw()? Considering that the static checking of noexcept is limited, doesn't this feature introduce new code security risks?

Bjarne Stroustrup: No, noexcept does not open a security hole; rather compared to throw(), it closes a few. CERT supported noexcept as approved by the committee. There are people who want exception throws statically checked. I'm not among those: No language has succeeded in providing a system of static checking that is not crippling for large systems, inefficient, or easily bypassed. The exception specifications of C++98 were a compromise design that we would have been better off without. They are now deprecated, so don't use them. They lead to efficiency problems and surprises. noexcept addresses the one case where the exception specifications sometimes worked well: simply stating that a function is not supposed to throw. In C++98, some people express that by saying throw(), but they can not know whether their implementation then imposed a significant overhead (some do) and might end up executing a (potentially unknown) unexpected handler. With noexcept a throw is considered a fatal design error and the program immediately terminated. That gives security and major optimization opportunities.

The availability of noexcept should lead to heavier use of exceptions and to safer code.

Danny Kalev: The difficulties associated with the design of some C++0x features including rvalue references, lambda expressions and of course concepts lead critics to claim that C++ is too old and inflexible. Is there a grain of truth in this claim? Will there come a time when you decide that C++ can no longer be extended and improved, and that a new programming language is required instead?

Bjarne Stroustrup: I more often hear the claim that C++ is too flexible and of course that it is too large. New languages tend to be simple because they don't yet serve a large community. All languages grow with time. Of course it is harder to modify a large, old, and massively used language than coming up with something new. However, most new languages die in infancy and many of the new simple ideas turn out to be just too simplistic for real-world use. Adding to C++ is difficult and the process to get a new feature accepted is typically most painful for its proposer. However, once accepted, the new feature can have major impact on a large community. If I didn't want to have an impact on the world, I could try to get my intellectual stimulation through crossword puzzles, writing fiction, or designing a toy programming language.

Of course, I dream of designing a new, smaller, and better language than C++. But each time I have looked at the problems (to be solved by a new language) and the likely impact of the new language, I have decided that most of what could be achieved through a new language could be done within C++ and its standard library. The odds of making a positive impact on the programming world is - for me at least - much better through the tedious route though C++ than through the design, implementation, and popularization of a new language.

Danny Kalev: Regarding Unicode support, the C++0x standard includes char16_t and char32_t_as well as u16string and u32string to work with UTF16 and UTF32 encoded Unicode strings. However, the standard library doesn't support these in streams. For example, there is no u16cout or u32cout. I'm wondering, how can we use char16_t strings and write them to standard output?

Bjarne Stroustrup: Obviously, we ought to have Unicode streams and other much extended Unicode support in the standard library. The committee knew that but didn't have anyone with the skills and time to do the work, so unfortunately, this is one of the many areas where you have to look for "third party" support. There are libraries "out there" with good support for Unicode. For example, the Poco libraries "for building network- and internet-based application" (http://pocoproject.org/index.html) is available for download under the boost open-source license. There is also Unicode support (somewhere) in the Microsoft C++ support libraries.

It is unfortunate that something as fundamental as Unicode library support is not in the standard library, but in general, we have to remember that most libraries are not and can't be in the standard library. My C++ page contains links to many libraries, to collections of libraries, and to lists of libraries. One estimate is that there are over 10,000 C++ libraries "out there" (both commercial and open-source). The problem is to find them and evaluate them.

Danny Kalev: Finally, what are your New Years' resolutions?

Bjarne Stroustrup:

  • To get C++0x formally approved as an ISO standard.
  • To produce a good first draft of The C++ Programming Language (4th Edition).
  • To spend more time with my grandchildren.
  • To have at least one interesting new technical insight.

Related Articles

This article was originally published on Monday Jan 10th 2011
Mobile Site | Full Site