Python 3 Q & A

Published:29th June, 2012
Last Updated:11 February, 2020

Throughout the long transition to “Python 3 by default” in the Python ecosystem, the question was occasionally raised as to whether or not the core Python developers were acting as reasonable stewards of the Python language.

While it largely stopped being a concern after the release of Python 3.5 in September 2015, it was an entirely appropriate question prior to that, as Python 3 introduced backwards incompatible changes that more obviously helped future users of the language than they did current users, so existing users (especially library and framework developers) were being asked to devote time and effort to a transition that would cost them more in time and energy in the near term than it would save them for years to come.

Since I had seen variants of these questions several times over the years, I started this FAQ as an intermittently updated record of my thoughts on the topic, with updates generally being prompted by new iterations of the questions. I gave Sumana Harihareswara co-maintainer access in September 2019 so she could aid in updating it, but for simplicity’s sake will retain the first-person singular (“I”) throughout. You can see the full history of changes in the source repo.

The views expressed below are my own. While many of them are shared by other core developers, and I use “we” in several places where I believe that to be the case, I don’t claim to be writing on the behalf of every core developer on every point. Several core developers (including Guido) have reviewed and offered comments on this document at various points in time, and aside from Guido noting that I was incorrect about his initial motivation in creating Python 3, none of them has raised any objections to specific points or the document in general.

I am also not writing on behalf of the Python Software Foundation (of which I am a nominated Fellow), nor on behalf of Python’s Steering Council (of whom I am a member), nor on behalf of Red Hat (my previous employer, for whom I worked for much of the time I maintained this). However, I do use several Red Hat specific examples when discussing enterprise perception and adoption of the Python platform - effectively bridging that gap between early adopters and the vast majority of prospective platform users is kinda what Red Hat specialises in, so I consider them an important measure of the inroads Python 3 is making into more conservative development communities.

There were several extensive discussions of the state of the Python 3 transition at PyCon US 2014 in Montreal, starting at the language summit, and continuing throughout the conference. These helped clarify many of the remaining points of contention, and resulted in a range of changes to Python 3.5, Python 2.7, and the available tools to support forward migration from Python 2 to Python 3. These discussions didn’t stop, but have rather continued over the course of Python development, and can be expected to continue for as long as folks are developing software that either fits into the common subset of Python 2 & 3, or else are having to maintain software that continues to run solely under Python 2.

Note

If anyone is interested in writing about these issues in more formal media, please get in touch to check if particular answers are still accurate. Not only have the updates over the years been intermittent, they’ve also been less than completely comprehensive, so some answers may refer out to experiments that ultimately proved unininteresting or unsuccessful, or otherwise be out of date.

As with all essays on these pages, feedback is welcome via the issue tracker or Twitter.

TL;DR Version

  • Yes, we know this migration was/is disruptive.
  • Yes, we know that some sections of the community had never personally experienced the problems with the Python 2 Unicode model that this migration was designed to eliminate, or otherwise preferred the closer alignment between the Python 2 text model and the POSIX text model.
  • Yes, we know that many of those problems had already been solved by some sections of the community to their own satisfaction.
  • Yes, we know that by attempting to fix these problems in the core Unicode model we broke many of the workarounds that had been put in place to deal with the limitations of the old model
  • Yes, we are trying to ensure there is a smooth migration path from Python 2 to Python 3 to minimise the inevitable disruption
  • Yes, we know some members of the community would have liked the migration to move faster and found the “gently, gently, there’s no rush” approach of the core development team frustrating
  • No, we did not do this lightly
  • No, we did not see any other way to ensure Python remained a viable development platform as developer communities grow in locations where English is not the primary spoken language. It should be at least possible for users to start learning the basics of Python without having to first learn English as a prerequisite (even if English remains a requirement for full participation in the global Python and open source ecosystems).

It is my perspective that the web and GUI developers have the right idea: dealing with Unicode text correctly is not optional in the modern world. In large part, the Python 3 redesign involved taking Unicode handling principles elaborated in those parts of the community and building them into the core design of the language.

Why was Python 3 made incompatible with Python 2?

According to Guido, he initiated the Python 3 project to clean up a variety of issues with Python 2 where he didn’t feel comfortable with fixing them through the normal deprecation process. This included the removal of classic classes, changing integer division to automatically promote to a floating point result (retaining the separate floor division operation) and changing the core string type to be based on Unicode by default. With a compatibility break taking place anyway, the case was made to just include some other changes in that process (like converting print to a function), rather than going through the full deprecation process within the Python 2 series.

If it had just been about minor cleanups, the transition would likely have been more straightforward, but also less beneficial. However, the changes to the text model in Python 3 are one of those ideas that has profoundly changed the way I think about software, and we receive similar feedback from many other users that never really understood how Unicode worked in Python 2, but were able to grasp it far more easily in Python 3. Redesigning the way the Python builtin types model binary and text data has the ultimate aim of helping all Python applications (including the standard library itself) to handle Unicode text in a more consistent and reliable fashion (I originally had “without needing to rely on third party libraries and frameworks” here, but those are still generally needed to handle system boundaries correctly, even in Python 3).

Note

For a more complete version of this answer that places it in the wider industry context of Unicode adoption, see this article of mine on the Red Hat Developer Blog: The Transition to Multilingual Programming with Python

I also gave a presentation on the topic at PyCon Australia 2015, which is available online here

The core Unicode support in the Python 2 series has the honour of being documented in PEP 100. It was created as Misc/unicode.txt in March 2000 (before the PEP process even existed) to integrate Unicode 3.0 support into Python 2.0. Once the PEP process was defined, it was deemed more appropriate to capture these details as an informational PEP.

Guido, along with the wider Python and software development communities, learned a lot about the best techniques for handling Unicode in the six years between the introduction of Unicode support in Python 2.0 and inauguration of the python-3000 mailing list in March 2006.

One of the most important guidelines for good Unicode handling is to ensure that all encoding and decoding occurs at system boundaries, with all internal text processing operating solely on Unicode data. The Python 2 Unicode model is essentially the POSIX text model with Unicode support bolted on to the side, so it doesn’t follow that guideline: it allows implicit decoding at almost any point where an 8-bit string encounters a Unicode string, along with implicit encoding at almost any location where an 8-bit string is needed but a Unicode string is provided.

One reason this approach is problematic is that it means the traceback for an unexpected UnicodeDecodeError or UnicodeEncodeError in a large Python 2.x code base almost never points you to the code that is broken. Instead, you have to trace the origins of the data in the failing operation, and try to figure out where the unexpected 8-bit or Unicode code string was introduced. By contrast, Python 3 is designed to fail fast in most situations: when a UnicodeError of any kind occurs, it is more likely that the problem actually does lie somewhere close to the operation that failed. In those cases where Python 3 doesn’t fail fast, it’s because it is designed to “round trip” - so long as the output encoding matches the input encoding (even if it turns out the data isn’t properly encoded according to that encoding), Python 3 will aim to faithfully reproduce the input byte sequence as the output byte sequence.

The implicit nature of the conversions in Python 2 also means that encoding operations may raise decoding errors and vice-versa, depending on the input types and the codecs involved.

A more pernicious problem arises when Python 2 doesn’t throw an exception at all - this problem occurs when two 8-bit strings with data in different text encodings are concatenated or otherwise combined. The result is invalid data, but Python will happily pass it on to other applications in its corrupted form. Python 3 isn’t completely immune to this problem, but it should arise in substantially fewer cases.

The general guiding philosophy of the text model in Python 3 is essentially:

  • try to do the right thing by default
  • if we can’t figure out the right thing to do, throw an exception
  • as far as is practical, always require users to opt in to behaviours that pose a significant risk of silently corrupting data in non-ASCII compatible encodings

Ned Batchelder’s wonderful Pragmatic Unicode talk/essay could just as well be titled “This is why Python 3 exists”. There are a large number of Unicode handling bugs in the Python 2 standard library that have not been, and will not be, fixed, as fixing them within the constraints of the Python 2 text model is considered too hard to be worth the effort (to put that effort into context: if you judge the core development team by our actions it is clear that we consider that creating and promoting Python 3 was an easier and more pleasant alternative to attempting to fix those issues while abiding by Python 2’s backwards compatibility requirements).

The revised text model in Python 3 also means that the primary string type is now fully Unicode capable. This brings Python closer to the model used in the JVM, Android, .NET CLR, and Unicode capable Windows APIs. One key consequence of this is that the interpreter core in Python 3 is far more tolerant of paths that contain Unicode characters on Windows (so, for example, having a non-ASCII character in your username should no longer cause any problems with running Python scripts from your home directory on Windows). The surrogateescape error handler added in PEP 383 is designed to bridge the gap between the new text model in Python 3 and the possibility of receiving data through bytes oriented APIs on POSIX systems where the declared system encoding doesn’t match the encoding of the data itself. That error handler is also useful in other cases where applications need to tolerate mismatches between declared encodings and actual data - while it does share some of the problems of the Python 2 Unicode model, it at least has the virtue of only causing problems in the case of errors either in the input data or the declared encoding, where Python 2 could get into trouble in the presence of multiple data sources with different encodings, even if all the input was correctly encoded in its declared encoding.

Python 3 also embeds Unicode support more deeply into the language itself. With the primary string type handling the full Unicode range, it became practical to make UTF-8 the default source encoding (instead of ASCII) and adjust many parts of the language that were previously restricted to ASCII text (such as identifiers) to now permit a much wider range of Unicode characters. This permits developers with a native language other than English to use names in their own language rather than being forced to use names that fit within the ASCII character set. Some areas of the interpreter that were previously fragile in the face of Unicode text (such as displaying exception tracebacks) are also far more robust in Python 3.

Removing the implicit type conversions entirely also made it more practical to implement the new internal Unicode data model for Python 3.3, where the internal representation of Unicode strings is automatically adjusted based on the highest value code point that needs to be stored (see PEP 393 for details).

What actually changed in the text model between Python 2 and Python 3?

The Python 2 core text model looks like this:

  • str: 8-bit type containing binary data, or encoded text data in an unknown (hopefully ASCII compatible) encoding, represented as length 1 8-bit strings
  • unicode: 16-bit or 32-bit type (depending on build options) containing Unicode code points, represented as length 1 Unicode strings

That first type is essentially the way POSIX systems model text data, so it is incredibly convenient for interfacing with POSIX environments, since it lets you just copy bits around without worrying about their encoding. It is also useful for dealing with the ASCII compatible segments that are part of many binary protocols.

The conceptual problem with this model is that it is an appropriate model for boundary code - the kind of code that handles the transformation between wire protocols and file formats (which are always a series of bytes), and the more structured data types actually manipulated by applications (which may include opaque binary blobs, but are more typically things like text, numbers and containers).

Actual applications shouldn’t be manipulating values that “might be text, might be arbitrary binary data”. In particular, manipulating text values as binary data in multiple different text encodings can easily cause a problem the Japanese named “mojibake”: binary data that includes text in multiple encodings, but with no clear structure that defines which parts are in which encoding.

Unfortunately, Python 2 uses a type with exactly those semantics as its core string type, permits silent promotion from the “might be binary data” type to the “is definitely text” type and provides little support for accounting for encoding differences.

So Python 3 changes the core text model to be one that is more appropriate for application code rather than boundary code:

  • str: a sequence of Unicode code points, represented as length 1 strings (always contains text data)
  • bytes: a sequence of integers between 0 and 255 inclusive (always contains arbitrary binary data). While it still has many operations that are designed to make it convenient to work on ASCII compatible segments in binary data formats, it is not implicitly interoperable with the str type.

The hybrid “might be encoded text, might be arbitrary binary data, can interoperate with both other instances of str and also with instances of unicode” type was deliberately removed from the core text model because using the same type for multiple distinct purposes makes it incredibly difficult to reason about correctly. The core model in Python 3 opts to handle the “arbitrary binary data” case and the “ASCII compatible segments in binary data formats” case, leaving the direct manipulation of encoded text to a (currently still hypothetical) third party type (due to the many issues that approach poses when dealing with multibyte and variable width text encodings).

The purpose of boundary code is then to hammer whatever comes in over the wire or is available on disk into a format suitable for passing on to application code.

Unfortunately, there have turned out to be some key challenges in making this model pervasive in Python 3:

  • the same design changes that improve Python 3’s Windows integration by changing several OS interfaces to operate on text rather than binary data also make it more sensitive to locale misconfiguration issues on POSIX operating systems other than Mac OS X. In Python 2, text is always sent and received from POSIX operating system interfaces as binary data, and the associated decoding and encoding operations are fully under the control of the application. In Python 3, the interpreter aims to handle these operations automatically, but in releases up to and including Python 3.6 it needs to rely on the default settings in the OS provided locale module to handle the conversion, making it potentially sensitive to configuration issues that many Python 2 applications could ignore. Most notably, if the OS erroneously claims that “ascii” is a suitable encoding to use for operating system interfaces (as happens by default in a number of cases, due to the formal definition of the ANSI C locale predating the invention of UTF-8 by a few years), the Python 3 interpreter will believe it, and will complain if asked to handle non-ASCII data. PEP 538 and PEP 540 offer some possible improvements in this area (by assuming UTF-8 as the preferred text encoding when running in the default C locale), but it isn’t a trivial fix due to the phase of the interpreter startup sequence where the problem occurs. (Thanks go to Armin Ronacher for clearly articulating many of these details - see his write-up in the click documentation)
  • when migrating libraries and frameworks from Python 2 to Python 3 that handle boundary API problems, the lack of the hybrid “might be text, might be arbitrary bytes” type can be keenly felt, as the implicitly interoperable type was essential to being able to cleanly share code between the two modes of operation. This usually isn’t a major problem for new Python 3 code - such code is typically designed to operate in the binary domain (perhaps relying on the methods for working with ASCII compatible segments), the text domain, or to handle a transition between them. However, code being ported from Python 2 may need to continue to implement hybrid APIs in order to accommodate users that make different decisions regarding whether to operate in the binary domain or the text domain in Python 3 - because Python 2 blurred the distinction, different users will make different choices, and third party libraries and frameworks may need to account for that rather than forcing a particular answer for all users.
  • in the initial Python 3 design, interpolation of variables into a format string was treated solely as a text domain operation. While this proved to be a reasonable design decision for the flexible Python-specific str.format operation, PEP 461 restored printf-style interpolation for ASCII compatible segments in binary data in Python 3.5. Prior to that change, the lack of this feature could sometimes be an irritation when working extensively in Python 3 with wire protocols and file formats that include ASCII compatible segments.
  • while the API design of the str type in Python 3 was based directly on the unicode type in Python 2, the bytes type doesn’t have such a clean heritage. Instead, it evolved over the course of the initial Python 3 pre-release design period, starting from a model where the only type for binary data handling was the type now called bytearray. That type was modelled directly on the array.array('B') type, and hence produced integers when iterating over it or indexing into it. During the pre-release design period, the lack of an immutable binary data type was identified as a problem, and the (then mutable) bytes type was renamed to bytearray and a new immutable bytes type added. The now familiar “bytes literal” syntax was introduced (prepending a “b” prefix to the string literal syntax) and the representations of the two types were also adjusted to be based on the new bytes literal syntax. With the benefit of hindsight, it has become clear another change should have been made at the same time: with so many affordances switched back to matching those of the Python 2 str type (including the use of the new bytes literal syntax to refer to that type in Python 2.6 and 2.7), bytes and bytearray should have been been switched away from behaving like a tuple of integers and list of integers (respectively) and instead modified to be containers of length 1 bytes objects, just as the str type is a container of length 1 str objects. Unfortunately, that change was not made at the time, and now backwards compatibility constraints within the Python 3 series itself makes it highly unlikely the behaviour will be changed in the future either. PEP 467 covers a number of other still visible remnants of this convoluted design history that are more amenable to being addressed within the constraints of Python’s normal Python deprecation processes.

These changes are a key source of friction when it comes to Python 3 between the Python core developers and other experts that had fully mastered the Python 2 text model, especially those that focus on targeting POSIX platforms rather than Windows or the JVM, as well as those that focus on writing boundary code, such as networking libraries, web frameworks and file format parsers and generators. These developers bore a lot of the burden of adjusting to these changes on behalf of their users, often while gaining few or none of the benefits.

That said, while these issues certainly aren’t ideal, they also won’t impact many users that are relying on libraries and frameworks to deal with boundary issues, and can afford to ignore possible misbehaviour in misconfigured POSIX environments. As Python 3 has matured as a platform, most of those areas where it has regressed in suitability relative to Python 2 have been addressed. In particular, the ongoing migrations of Linux distribution utilities from Python 2 to Python 3 have seen many of the platform integration issues on POSIX systems dealt with in a cleaner fashion. The tuple-of-ints and list-of-ints behaviour of bytes and bytearray is unlikely to change, but proposals like PEP 467 may bring better tools for dealing with them.

Why not just assume UTF-8 and avoid having to decode at system boundaries?

The design decision to go with a fixed width Unicode representation both externally and internally has a long history in Python, going all the way back to the addition of Python’s original Unicode support in Python 2.0. Using a fixed width type at that point meant that many of the algorithms could be shared between the original 8-bit str type and the new 16-or-32-bit unicode type. (Note that adoption of this particular approach predates my own involvement in CPython core development - as with many other aspects of CPython’s text handling support, it’s something I’ve learned about while helping with the transition to pervasive Unicode support in the standard library and elsewhere for Python 3).

That design meant that, historically, CPython builds had to choose what size to use for the internal representation of Unicode text. We always chose to use “narrow” builds for the Windows binary installers published on python.org, as the UTF-16 internal representation was the best fit for the Windows text handling APIs.

Linux distributions, by contrast, almost all chose the memory hungry “wide” builds that allocated 32 bits per Unicode code point in Python 2 unicode objects and Python 3 str objects (up to & including Python 3.2), even for pure ASCII text. There’s a reason they went for that option, though: it was better at handling Unicode code points outside the basic multilingual plane. In narrow builds the UTF-16 code points were exposed directly in both the C API and the Python API of the unicode type, and hence were prone to bugs related to incorrect handling of code points greater than 65,535 in code that assumed a one-to-one correspondence between Python code points and Unicode code points. This wasn’t generally a big deal when code points in common use all tended to fit in the BMP, but started to become more problematic as things like mathematical and musical notation, ancient languages, emoticons and additional CJK ideographs were added. Given the choice between greater memory efficiency and correctness, the Linux distributions chose correctness, imposing a non-trivial memory usage penalty on Unicode heavy applications that couldn’t rely entirely on str objects in Python 2 or bytes and bytearray objects in Python 3. Those larger strings also came at a cost in speed, since they not only meant having more data to move around relative to narrow builds (or applications that only allowed 8-bit text), but the larger memory footprint also made CPU caches less effective.

When it came to the design of the C level text representation for Python 3, the existing Python 2 Unicode design wasn’t up for reconsideration - the Python 2 unicode type was mapped directly to the Python 3 str type. This is most obvious in the Python 3 C API, which still uses the same PyUnicode_* prefix for text manipulation APIs, as that was the easiest way to preserve compatibility with C extensions that were originally written against Python 2.

However, removing the intertwining of the 8-bit str type and the unicode type that existed in Python 2 paved the way for eliminating the narrow vs wide build distinction in Python 3.3, and eliminating a significant portion of the memory cost associated with getting correct Unicode handling in earlier versions of Python. As a result of PEP 393, strings that consist solely of latin-1 or UCS2 code points in Python 3.3+ are able to use 8 or 16 bits per code point (as appropriate), while still being able to use string manipulation algorithms that rely on the assumption of consistent code point sizes within a given string. As with the original Python 3 implementation, there were also a large number of constraints imposed on this redesign of the internal representation based on the public C API, and that is reflected in some of the more complicated aspects of the PEP.

While it’s theoretically possible to write string manipulation algorithms that work correctly with variable width encodings (potentially saving even more memory), it isn’t easy to do so, and for cross-platform runtimes that interoperate closely with the underlying operating system the way CPython does, there isn’t an obvious universally correct choice even today, let alone back in 2006 when Guido first started the Python 3 project. UTF-8 comes closest (hence the wording of this question), but it still poses risks of silent data corruption on Linux if you don’t explicitly transcode data at system boundaries (particularly if the actual encoding of metadata provided by the system is ASCII incompatible, as can happen in East Asian countries using encodings like Shift-JIS and GB-18030) and still requires transcoding between UTF-16-LE and UTF-8 on Windows (the bytes-oriented APIs on Windows are generally restricted to the mbcs encoding, making them effectively useless for proper Unicode handling - it’s necessary to switch to the Windows specific UTF-16 based APIs to make things work properly).

The Python 3 text model also trades additional memory usage for encoding and decoding speed in some cases, including caching the UTF-8 representation of a string when appropriate. In addition to UTF-8, other key codecs like ASCII, latin-1, UTF-16 and UTF-32 are closely integrated with the core text implementation in order to make them as efficient as is practical.

The current Python 3 text model certainly has its challenges, especially around Linux compatibility (see PEP 383 for an example of the complexity associated with that problem), but those are considered the lesser evil when compared to the alternative of breaking C extension compatibility and having to rewrite all the string manipulation algorithms to handle a variable width internal encoding, while still facing significant integration challenges on both Windows and Linux. Instead of anyone pursuing such a drastic change, I expect the remaining Linux integration issues for the existing model to be resolved as we help Linux distributions like Ubuntu and Fedora migrate their system services to Python 3 (in the specific case of Fedora, that migration encompasses both the operating system installer and the package manager).

Still, for new runtimes invented today, particularly those aimed primarily at new server applications running on Linux that can afford to ignore the integration challenges that arise on Windows and older Linux systems using encodings other than UTF-8, using UTF-8 for their internal string representation makes a lot of sense. It’s just best to avoid exposing the raw binary representation of text data for direct manipulation in user code: experience has shown that a Unicode code point based abstraction is much easier to work with, even if it means opting out of providing O(1) indexing for arbitrary code points in a string to avoid allocating additional memory per code point based on the largest code point in the string. For new languages that are specifically designed to accommodate a variable width internal encoding for text, a file-like opaque token based seek/tell style API is likely to be more appropriate for random access to strings than a Python style integer based indexing API. The kind of internal flexibility offered by the latter approach can be seen in Python’s own io.StringIO implementation - in Python 3.4+, that aims to delay creation of a full string object for as long as possible, an optimisation that could be implemented transparently due to the file-like API that type exports.

Note

Python 3 does assume UTF-8 at system boundaries on Mac OS X, since that OS ensures that the assumption will almost always be correct. Starting with Python 3.6, CPython on Windows also assumes that binary data passed to operating system interfaces is in UTF-8 and transcodes it to UTF-16-LE before passing it to the relevant Windows APIs.

For Python 3.7, PEP 538 and PEP 540 are likely to extend the UTF-8 assumption to the default C locale more generally (so other system encodings will still be supported through the locale system, but the problematic ASCII default will be largely ignored).

OK, that explains Unicode, but what about all the other incompatible changes?

The other backwards incompatible changes in Python 3 largely fell into the following categories:

  • dropping deprecated features that were frequent sources of bugs in Python 2, or had been replaced by superior alternatives and retained solely for backwards compatibility
  • reducing the number of statements in the language
  • replacing concrete list and dict objects with more memory efficient alternatives
  • renaming modules to be more PEP 8 compliant and to automatically use C accelerators when available

The first of those were aimed at making the language easier to learn, and easier to maintain. Keeping deprecated features around isn’t free: in order to maintain code that uses those features, everyone needs to remember them and new developers need to be taught them. Python 2 had acquired a lot of quirks over the years, and the 3.x series allowed such design mistakes to be corrected.

While there were advantages to having print and exec as statements, they introduced a sharp discontinuity when switching from the statement forms to any other alternative approach (such as changing print to logging.debug or exec to execfile), and also required the use of awkward hacks to cope with the fact that they couldn’t accept keyword arguments. For Python 3, they were demoted to builtin functions in order to remove that discontinuity and to exploit the benefits of keyword only parameters.

The increased use of iterators and views was motivated by the fact that many of Python’s core APIs were designed before the introduction of the iterator protocol. That meant a lot unnecessary lists were being created when more memory efficient alternatives were now possible. We didn’t get them all (you’ll still find APIs that unnecessarily return concrete lists and dictionaries in various parts of the standard library), but the core APIs are all now significantly more memory efficient by default.

As with the removal of deprecated features, the various renaming operations were designed to make the language smaller and easier to learn. Names that don’t follow standard conventions need to be remembered as special cases, while those that follow a pattern can be derived just be remembering the pattern. Using the API compatible C accelerators automatically also means that end users no longer need to know about and explicitly request the accelerated variant, and alternative implementations don’t need to provide the modules under two different names.

No backwards incompatible changes were made just for the sake of making them. Each one was justified (at least at the time) on the basis of making the language either easier to learn or easier to use.

With the benefit of hindsight, a number of these other changes would probably have been better avoided (especially some of the renaming ones), but even those cases at least seemed like a good idea at the time. At this point, internal backwards compatibility requirements within the Python 3.x series mean it isn’t worth the hassle of changing them back, especially given the existence of the six compatibility project and other third party modules that support both Python 2 and Python 3 (for example, the requests package is an excellent alternative to using the low level urllib interfaces directly, even though six does provide appropriate cross-version compatible access through the six.moves.urllib namespace).

What other notable changes in Python 3 depend on the text model change?

One of the consequences of the intertwined implementations of the str and unicode types in Python 2 is that it made it difficult to update them to correctly interoperate with anything else. The dual type text model also made it quite difficult to add Unicode support to various APIs that previously didn’t support it.

This isn’t an exhaustive list, but here are several of the enhancements in Python 3 that would likely be prohibitively difficult to backport to Python 2 (even when they’re technically backwards compatible):

  • PEP 393 (more efficient text storage in memory)
  • Unicode identifier support
  • full Unicode module name support
  • improvements in Unicode path handling on Windows
  • multiple other improvements in Unicode handling when interfacing with Windows APIs
  • more robust and user friendly handling of Unicode characters in object representations and when displaying exceptions
  • increased consistency in Unicode handling in files and at the interactive prompt (although the C locale on POSIX systems still triggers undesirable behaviour in Python 3)
  • greater functional separation between text encodings and other codecs, including tailored exceptions nudging users towards the more generic APIs when needed (this change in Python 3.4 also eliminates certain classes of remote DOS attack targeted at the compression codecs in the codec machinery when using the convenience methods on the core types rather than the unrestricted interfaces in the codecs module)
  • using the new IO model (with automatic encoding and decoding support) by default

When is the last release of Python 2.7 coming out?

Python 2.7.18 is planned for release during PyCon North America in April 2020, per PEP 373. More details:

January 1, 2020: Code freeze for Python 2.7.18

The “End of Life/sunset” of Python 2.7 was January 1, 2020. Until January 1, 2020, the release manager for 2.7.x was working on Release 2.7.18, the final release of Python (developers added a few improvements to Python 2.7 between the 2.7.17 release on October 19, 2019 and the sunset date of January 1, 2020). On January 1, 2020, the release manager stopped development and froze the codebase (see python-dev discussion): from that date, there will be no backports to 2.7.18 from Python 3.

April 2020: Final Production Release of Python 2.7.18

It is expected that there will be no patches after the code freeze date for 2.7.18. (No regressions were introduced between the Python 2.7.17 release in October 2019 and the code freeze date of January 1, 2020.) Between January 1, 2020 and April 2020, the release manager will shepherd the release through the beta and Release Candidate process.

What are (or were) some of the key dates in the Python 3 transition?

Note

This list is rather incomplete and I’m unlikely to find the time to complete it - if anyone is curious enough to put together a more comprehensive timeline, feel free to use this answer as a starting point, or else just send a PR to add more entries to this list.

At least the following events should be included in a more complete list:

  • IPython Python 3 support
  • Cython Python 3 support
  • SWIG Python 3 support
  • links for the Ubuntu, Fedora and openSUSE “Python 3 as default” migration plans
  • SQL Alchemy Python 3 support
  • pytz Python 3 support
  • PyOpenSSL support
  • mod_wsgi Python 3 support (first 3.x WSGI implementation)
  • Tornado Python 3 support (first 3.x async web server)
  • Twisted Python 3 support (most comprehensive network protocol support)
  • Pyramid Python 3 support (first major 3.x compatible web framework)
  • Django 1.5 and 1.6 (experimental and stable Python 3 support)
  • Werkzeug and Flask Python 3 support
  • requests Python 3 support
  • pyside Python 3 support (first Python 3.x Qt bindings)
  • pygtk and/or pygobject Python support
  • wxPython phoenix project
  • VTK Python 3 support in August 2015 (blocked Mayavi, which blocked Canopy)
  • cx-Freeze Python 3 support
  • greenlet Python 3 support
  • pylint Python 3 support
  • nose2 Python 3 support
  • pytest Python 3 support
  • Editor/IDE support for Python 3 in: PyDev, Spyder, Python Tools for Visual Studio, PyCharm, WingIDE, Komodo (others?)
  • Embedded Python 3 support in: Blender, Kate, vim, gdb, gcc, LibreOffice (others?)
  • version availability in services like Google DataLab and Azure Notebooks
  • Python 3 availability in Heroku
  • availability in the major Chinese public cloud platforms (Alibaba/Aliyun, Tencent Qcloud, Huawei Enterprise Cloud, etc)
  • the day any bar on https://python3wos.appspot.com/ or wedge on http://py3readiness.org/ turned green was potentially a significant step for some subsection of the community :)

2006

March 2006: Guido van Rossum (the original creator of Python and hence Python’s Benevolent Dictator for Life), with financial support from Google, took the previously hypothetical “Python 3000” project and turned it into an active development project, aiming to create an updated Python language definition and reference interpreter implementation that addressed some fundamental limitations in the ability of the Python 2 reference interpreter to correctly handle non-ASCII text. (The project actually started earlier than this - March 2006 was when the python-3000 list was created to separate out the longer term Python 3 discussions from the active preparation for the Python 2.5 final release)

April 2006: Guido published PEP 3000, laying the ground rules for Python 3 development, and detailing the proposed migration strategy for Python 2 projects (the recommended porting approach has changed substantially since then, see What other changes have occurred that simplify migration? for more details). PEP 3100 describes several of the overall goals of the project, and lists many smaller changes that weren’t covered by their own PEPs. PEP 3099 covers a number of proposed changes that were explicitly declared out of scope of the Python 3000 project.

At this point in time, Python 2 and Python 3 started being developed in parallel by the core development team for the reference interpreter.

2007

August 2007: The first alpha release of Python 3.0 was published.

2008

February 2008: The first alpha release of Python 2.6 was published alongside the third alpha of Python 3.0. The release schedules for both Python 2.6 and 3.0 are covered in PEP 361.

October 2008: Python 2.6 was published, including the backwards compatible features defined for Python 3.0, along with a number of __future__ imports and the -3 switch to help make it practical to add Python 3 support to existing Python 2 software (or to migrate entirely from Python 2 to Python 3). While Python 2.6 received its final upstream security update in October 2013, maintenance & support remains available through some commercial redistributors.

December 2008: In a fit of misguided optimism, Python 3.0 was published with an unusably slow pure Python IO implementation - it worked tolerably well for small data sets, but was entirely impractical for handling realistic workloads on the CPython reference interpreter. (Python 3.0 received a single maintenance release, but was otherwise entirely superceded by the release of Python 3.1)

ActiveState became the first company I am aware of to start offering commercial Python 3 support by shipping ActivePython 3.0 almost immediately after the upstream release was published. They have subsequently continued this trend of closely following upstream Python 3 releases.

2009

March 2009: The first alpha release of Python 3.1, with an updated C accelerated IO stack, was published. PEP 375 covers the details of the Python 3.1 release cycle.

June 2009: Python 3.1 final was published, providing the first version of the Python 3 runtime that was genuinely usable for realistic workloads. Python 3.1 received its final security update in April 2012, and even commercial support for this version is no longer available.

September 2009: setuptools 0.6.2 was released, the first version to support Python 3.

October 2009: PEP 3003 was published, declaring a moraratorium on language level changes in Python 2.7 and Python 3.2. This was done to deliberately slow down the pace of core development for a couple of years, with additional effort focused on standard library improvements (as well as some improvements to the builtin types).

December 2009: The first alpha of Python 2.7 was published. PEP 373 covers the details of the Python 2.7 release cycle.

2010

July 2010: Python 2.7 final was published, providing many of the backwards compatible features added in the Python 3.1 and 3.2 releases. Python 2.7 is currently still fully supported by the core development team and will continue receiving maintenance & security updates until at least January 2020.

Once the Python 2.7 maintenance branch was created, the py3k development branch was retired: for the first time, the default branch in the main CPython repo was the upcoming version of Python 3.

August 2010: The first alpha of Python 3.2 was published. PEP 392 covers the details of the Python 3.2 release cycle. Python 3.2 restored preliminary support for the binary and text transform codecs that had been removed in Python 3.0.

NumPy 1.5.0 was released, the first version to support Python 3.

October 2010: PEP 3333 was published to define WSGI 1.1, a Python 3 compatible version of the Python Web Server Gateway Interface.

2011

February 2011: Python 3.2 final was published, providing the first version of Python 3 with support for the Web Server Gateway Interface. Python 3.2 received its final security update in February 2016, and even commercial support for this version is no longer available.

SciPy 0.9.0 was released, the first version to support Python 3.

March 2011: After Arch Linux updated their Python symlink to refer to Python 3 (breaking many scripts that expected it to refer to Python 2), PEP 394 was published to provide guidance to Linux distributions on more gracefully handling the transition from Python 2 to Python 3.

Also in March, CPython migrated from Subversion to Mercurial (see PEP 385), with the first message from Mercurial to the python-checkins list being this commit from Senthil Kumaran. This ended more than two years of managing parallel updates of four active branches using svnmerge rather than a modern DVCS.

April 2011: pip 1.0 was released, the first version to support Python 3.

virtualenv 1.6 was released, the first version to support Python 3.

November 2011: PEP 404 (the Python 2.8 Un-release Schedule) was published to make it crystal clear that the core development team had no plans to make a third parallel release in the Python 2.x series.

2012

March 2012: The first alpha of Python 3.3 was published. PEP 398 covers the details of the Python 3.3 release cycle. Notably, Python 3.3 restored support for Python 2 style Unicode literals after Armin Ronacher and other web framework developers pointed out that this was one change that the web frameworks couldn’t handle on behalf of their users. PEP 414 covers the detailed rationale for that change.

April 2012: Canonical published Ubuntu 12.04 LTS, including commercial support for both Python 2.7 and Python 3.2.

September 2012: Six and half years after the inauguration of the Python 3000 project, Python 3.3 final was published as the first Python 3 release without a corresponding Python 2 feature release. This release introduced the PEP 380 yield from syntax that was used heavily in the asyncio coroutine framework provisionally introduced to the standard library in Python 3.4, and subsequently declared stable in Python 3.6.

October 2012: PEP 430 was published, and the online Python documentation updated to present the Python 3 documentation by default. In order to preserve existing links, deep links continue to be interpreted as referring to the Python 2.7 documentation.

2013

March 2013: PEP 434 redefined IDLE as an application shipped with Python rather than part of the standard library, allowing the addition of new features in maintenance releases. Significantly, this allowed the Python 2.7 IDLE to be brought more into line with the features of the Python 3.x version.

Continuum Analytics started offering commercial support for cross-platform Python 3.3+ environments through their “Anaconda” Python distributions.

Pillow 2.0.0 was released, the first version to support Python 3.

August 2013: The first alpha of Python 3.4 was published. PEP 429 covers the details of the Python 3.4 release cycle. Amongst other changes, Python 3.4 restored full support for the binary and text transform codecs that were reinstated in Python 3.2, while maintaining the “text encodings only” restriction for the convenience methods on the builtin types.

September 2013: Red Hat published “Red Hat Software Collections 1.0”, providing commercial support for both Python 2.7 and Python 3.3 on Red Hat Enterprise Linux systems, with later editions adding support for additional 3.x releases.

December 2013: The initial development of MicroPython, a variant of Python 3 specifically for microcontrollers, was successfully crowdfunded on Kickstarter.

2014

March 2014: Python 3.4 final was published as the second Python 3 release without a corresponding Python 2 release. It included several features designed to provide a better starting experience for newcomers to Python, such as bundling the “pip” installer by default, and including a rich asynchronous IO library.

April 2014: Ubuntu 14.04 LTS, initial target release for the “Only Python 3 on the install media” Ubuntu migration plan. (They didn’t quite make it - a few test packages short on Ubuntu Touch, further away on the server and desktop images)

Red Hat also announced the creation of softwarecollections.org as the upstream project powering the Red Hat Software Collections product. The whole idea of both the project and the product is to make it easy to run applications using newer (or older!) language, database and web server runtimes, without interfering with the versions of those runtimes integrated directly into the operating system.

Note

With the original “5 years for migration to Python 3” target date approaching, April 2014 is also when Guido van Rossum amended the Python 2.7 release PEP to move the expected end-of-life date for Python 2.7 out to 2020.

May 2014: Python 2.7.7 was published, the first Python 2.7 maintenance release to incorporate additional security enhancement features as described in PEP 466. Also the first release where Microsoft contributed developer time to the creation of the Windows installers.

June 2014: The first stable release of PyPy3, providing a version of the PyPy runtime that is compatible with Python 3.2.5 (together with PEP 414’s restoration of the u'' string literal prefix that first appeared in Python 3.3 for CPython).

Red Hat published Red Hat Enterprise Linux 7, with Python 2.7 as the system Python. This release ensures that Python 2.7 will remain a commercially supported platform until at least 2024 (based on Red Hat’s 10 year support lifecycle).

Note

June 2014 also marked 5 years after the first production capable Python 3.x release (Python 3.1), and the original target date for completion of the Python 3 migration.

July 2014: CentOS 7 was released, providing a community distro based on Red Hat Enterprise Linux 7, and marking the beginning of the end of the Python 2.7 rollout (the CentOS system Python is a key dependency for many Python users).

boto v2.32.0 released with Python 3 support for most modules.

nltk 3.0b1 released with Python 3 support and the NLTK book switched over to covering Python 3 by default.

2015

February 2015: The first alpha of Python 3.5 was published. PEP 478 covers the details of the Python 3.5 release cycle. Amongst other changes, PEP 461 restored support for printf-style interpolation of binary data, addressing a significant usability regression in Python 3 relative to Python 2.

October 2014: SUSE Linux Enterprise Server 12 was released, containing supported Python 3.4 RPMs, adding SUSE to the list of commercial Python 3 redistributors.

March 2015: Microsoft Azure App Service launched with both Python 2.7 and Python 3.4 support, adding Microsoft to the list of commercial Python redistributors for the first time.

August 2015: At the Fedora community’s annual Flock conference, Denise Dumas (Red Hat’s VP of Platform Engineering), explicitly stated that it would be an engineering goal to include only Python 3 in the base operating system for the next major version of Red Hat Enterprise Linux (previously this had been implied by Red Hat’s work on migrating Fedora and its infrastructure to Python 3, but not explicitly stated in a public venue)

September 2015: Python 3.5 final was released, bringing native syntactic support for asynchronous coroutines and a matrix multiplication operator, as well as the typing module for static type hints. Applications, libraries and frameworks wishing to take advantage of the new syntactic features need to reconsider whether or not to continue supporting Python 2.7.

Twisted 15.4 was released, the first version to include a Python 3 compatible version of the “Twisted Trial” test runner. This allowed the Twisted project to start running its test suite under Python 3, leading to steadily increasing Python 3 compatibility in subsequent Twisted releases.

October 2015: Fedora 23 shipped with only Python 3 in the LiveCD and all default images other than the Server edition.

MicroPython support for the BBC micro:bit project was publicly announced, ensuring first class Python 3 support in a significant educational initiative.

PyInstaller 3.0 was released, supporting Python 2.7, and 3.3+.

2016

March 2016: gevent 1.1 was released, supporting Python 2.6, 2.7, and 3.3+.

May 2016: Several key projects in the Scientific Python community published the Python 3 Statement, explicitly declaring their intent to end Python 2 support in line with the reference interpreter’s anticipated 2020 date for the end of free community support.

August 2016: Google App Engine added official Python 3.4(!) support to their Flexible Environments (Python 3.5 support followed not long after, but the original announcement was for Python 3.4).

As part of rolling out Python 3.5 support, Microsoft Azure published instructions on how to select a particular Python version using App Service Site Extensions.

Initial release of Enthought Deployment Manager, with support for Python 2.7 and 3.5.

Mozilla provided the PyPy project with a development grant to bring their PyPy3 variant up to full compatibility with Python 3.5.

December 2016: Python 3.6 final was released, bringing further syntactic enhancements for asynchronous coroutines and static type hints, as well as a new compiler assisted string formatting syntax that manages to be both more readable (due to the use of inline interpolation expressions) and faster (due to the compiler assisted format parsing) than previous string formatting options. Through PEP 528 and PEP 529, this release also featured significant improvements to the Windows compatibility of bytes-centric POSIX applications, and the Windows-specific py launcher started using Python 3 by default when both Python 2.x and 3.x are available on the system.

2017

March 2017: The first beta release of PyPy3 largely compatible with Python 3.5 was published (including support for the Python 3.6 f-string syntax).

Enthought Canopy 2.0.0 available, supporting Python 2.7 and 3.5 (official binary release date TBD - as of April 2017, the download page still offers Canopy 1.7.4)

April 2017: AWS Lambda added official Python 3.6 support, making Python 3 available by default through the 3 largest public cloud providers (Amazon, Microsoft, Google).

IPython 6.0 was released, the first feature release to require Python 3. The IPython 5.x series remains in maintenance mode as the last version supporting Python 2.7 (and Python 3 based variants of IPython retain full support for running and interacting with Python 2 language kernels using Project Jupyter’s language independent notebook protocol).

December 2017: Django released Django 2.0, the first version of Django to drop support for Python 2.7.

2018

March 2018: Guido van Rossum clarified that “The way I see the situation for 2.7 is that EOL is January 1st, 2020, and there will be no updates, not even source-only security patches, after that date. Support (from the core devs, the PSF, and python.org) stops completely on that date. If you want support for 2.7 beyond that day you will have to pay a commercial vendor.”

June 2018: Python 3.7.0 final was released, bringing improvements such as the new built-in breakpoint() function defined by PEP 553, time functions with nanosecond resolution per PEP 564, and more streamlined Python documentation translations.

September 2018: matplotlib released 3.0.0, the first release to drop support for Python 2.x.

2019

May 2019: The release of Red Hat Enterprise Linux 8. RHEL 8 does not come with Python 2 or Python 3 already installed and usable by default. Red Hat recommended users choose Python 3, and the platform Python for use by system tools in RHEL 8 is Python 3.6.

August 2019: The entirety of http://py3readiness.org/ turned green, indicating Python 3 support for the 360 most downloaded packages on PyPI.

September 2019: The release of CentOS 7 (in which Python 3 is available) and CentOS 8 (which follows RHEL 8 in its approach to Python).

Future

Note

At time of writing, the events below are in the future, and hence speculative as to their exact nature and timing. However, they reflect currently available information based on the stated intentions of developers and distributors.

April 2018: Revised anticipated date for Ubuntu and Fedora to have finished migrating default components of their respective server editions to Python 3 (some common Linux components, most notably the Samba protocol server, proved challenging to migrate, so the stateful server variants of these distributions ended up taking longer to migrate to Python 3 than other variants that omitted those components from their default package set)

January? 2020: Anticipated date for Python 2.7 to switch to security fix only mode, ending roughly thirteen years of parallel maintenance of Python 2 and 3 by the core development team for the reference interpreter.

April 2021: Anticipated date for Ubuntu LTS 16.04 to go end of life, the first potential end date for commercial Python 2 support from Canonical (if Python 2.7 is successfully migrated to the community supported repositories for the Ubuntu 18.04 LTS release)

April 2024: Anticipated date for Ubuntu LTS 18.04 to go end of life, the second potential end date for commercial Python 2 support from Canonical (if it proves necessary to keep Python 2.7 in the commercially supported repositories as a dependency for the Ubuntu 18.04 LTS release)

June 2024: Anticipated date for Red Hat Enterprise Linux 7 to go end of life, also anticipated to be the last commercially supported redistribution of the Python 2 series.

When did Python 3 become the obvious choice for new projects?

I put the date for this as the release of Python 3.5, in September 2015. This release brought with it two major syntactic enhancemens (one giving Python’s coroutine support its own dedicated syntax, distinct from generators, and another providing a binary operator for matrix multiplication), and restored a key feature that had been missing relative to Python 2 (printf-style binary interpolation support). It also incorporated a couple of key reliability and maintainability enhancements, in the form of automated handling of EINTR signals, and the inclusion of a gradual typing framework in the standard library.

Others may place the boundary at the release of Python 3.6, in December 2016, as the new “f-string” syntax provides a form of compiler-assisted string interpolation that is both faster and more readable than its predecessors:

print("Hello %s!" % name)        # All versions
print("Hello {0}!".format(name)) # Since Python 2.6 & 3.0
print("Hello {}!".format(name))  # Since Python 2.7 & 3.2
print(f"Hello {name}!")          # Since Python 3.6

Python 3.6 also provides further enhancements to the native coroutine syntax, as well as full syntactic support for annotating variables with static type hints.

Going in to this transition process, my personal estimate was that it would take roughly 5 years to get from the first production ready release of Python 3 to the point where its ecosystem would be sufficiently mature for it to be recommended unreservedly for all new Python projects.

Since 3.0 turned out to be a false start due to its IO stack being unusably slow, I start that counter from the release of 3.1: June 27, 2009. With Python 3.5 being released a little over 6 years after 3.1 and 3.6 a little more than a year after that, that means we clearly missed that original goal - the text model changes in particular proved to be a larger barrier to migration than expected, which slowed adoption by existing library and framework developers.

However, despite those challenges, key parts of the ecosystem were able to successfully add Python 3 support well before the 3.5 release. NumPy and the rest of the scientific Python stack supported both versions by 2015, as did several GUI frameworks (including PyGame).

The Pyramid, Django and Flask web frameworks supported both versions, as did the mod_wsgi Python application server, and the py2exe, py2app and cx-Freeze binary creators. The upgrade of Pillow from a repackaging project to a full development fork also brought PIL support to Python 3.

nltk supported Python 3 as of nltk 3.0, and the NLTK bookswitched to be based on Python 3 at the same time.

For AWS users, most boto modules became available on Python 3 as of http://boto.readthedocs.org/en/latest/releasenotes/v2.32.0.html.

PyInstaller is a popular option for creating native system installers for Python applications, and it has supported Python 3 since the 3.0 release in October 2015.

gevent is a popular alternative to writing natively asynchronous code, and it became generally available for Python 3 with the 1.1 release in March 2016.

As of April 2017, porting the full Twisted networking framework to Python 3 is still a work in progress, but many parts of it are already fully operational, and for new projects, native asyncio-based alternatives are often going to be available in Python 3 (especially for common protocols like HTTPS).

I think Python 3.5 is a superior language to 2.7 in almost every way (with the error reporting improvements being the ones I missed most when my day job involved working on a Python 2.6 application).

For educational purposes, there are a few concepts like functions, iterables and Unicode that need to be introduced earlier than was needed in Python 2, and there are still a few rough edges in adapting between the POSIX text model and the Python 3 one, but these are more than compensated for through improved default behaviours and more helpful error messages.

While students in enterprise environments may still need to learn Python 2 for a few more years, there are some significant benefits in learning Python 3 first, as that means students will already know which concepts survived the transition, and be more naturally inclined to write code that fits into the common subset of Python 2 and Python 3. This approach will also encourage new Python users that need to use Python 2 for professional reasons to take advantage of the backports and other support modules on PyPI to bring their Python 2.x usage as close to writing Python 3 code as is practical.

Support in enterprise Linux distributions is also a key point for uptake of Python 3. Canonical have already shipped long term support for three versions of Python 3 (Python 3.2 in Ubuntu 12.04 LTS, 3.4 in 14.04 LTS, and 3.5 in 14.04 LTS) and are continuing with the process of eliminating Python 2 from the installation images.

A Python 3 stack has existed in Fedora since Fedora 13 and has been growing over time, with Python 2 successfully removed from the live install CDs in late 2015 (Fedora 23). Red Hat also now ship fully supported Python 3.x runtimes as part of the Red Hat Software Collections product and the OpenShift Enterprise self-hosted Platform-as-a-Service offering (with new 3.x versions typically becoming commercially available within 6-12 months of the upstream release, and then remaining supported for 3 years from that point).

At Fedora’s annual Flock conference in August 2015, Denise Dumas (VP of Platform Engineering) also indicated that Red Hat aimed to have the next major version of Red Hat Enterprise Linux ship only Python 3 in the base operating system, with Python 2 available solely through the Software Collections model (inverting the current situation, where Python 2 is available in both Software Collections and the base operating system, while Python 3 is only commercially available through Software Collections and the Software Collections based OpenShift environments).

The Arch Linux team have gone even further, making Python 3 the default Python on Arch installations. I am dubious as to the wisdom of their specific migration strategy, but I certainly can’t complain about the vote of confidence!

The OpenStack project, likely the largest open source Python project short of the Linux distro aggregations, is also in the process of migrating from Python 2 to Python 3, and maintains a detailed status tracking page for the migration.

Outside the Linux ecosystem, other Python redistributors like ActiveState, Enthought, and Continuum Analytics provide both Python 2 and Python 3 releases, and Python 3 environments are also available through the major public cloud platforms.

When can we expect Python 2 to be a purely historical relic?

The short answer is: 2024, four years after CPython support ends in 2020.

Python 2 is still a good language. While I think Python 3 is a better language (especially when it comes to the text model, error reporting, the native coroutine syntax in Python 3.5, and the string formatting syntax in Python 3.6), we’ve deliberately designed the migration plan so users could update on their timetable rather than ours (at least within a window of several years), and we expect commercial redistributors to extend that timeline even further.

The PyPy project have also stated their intention to continue providing a Python 2.7 compatible runtime indefinitely, since the RPython language used to implement PyPy is a subset of Python 2 rather than of Python 3.

I personally expect CPython 2.7 to remain a reasonably common deployment platform until mid 2024. Red Hat Enterprise Linux 7 (released in June 2014) uses CPython 2.7 as the system Python, and many library, framework and application developers base their minimum supported version of Python on the system Python in RHEL (especially since that also becomes the system Python in downstream rebuilds like CentOS and Scientific Linux). While Red Hat’s actively trying to change that slow update cycle by encouraging application developers to target the Software Collections runtimes rather than the system Python, that change in itself is a significant cultural shift for the RHEL/CentOS user base.

Aside from Blender, it appears many publishing and animation tools with Python support are happy enough with Python 2.7 that they aren’t quickly moving to Python 3. Scribus, and some AutoDesk tools like 3ds Max, Maya and MotionBuilder, support Python 2.7 and are only slowly moving to support Python 3. But some have made stronger commitments. Inkscape’s LTS 0.92.x line aims to continue supporting Python 2.7 but 0.92.5 will also support Python 3, and the 1.0 line will drop support for Python 2. And the VFX Reference Platform (tracked by AutoDesk etc.) is moving to Python 3.7 in calendar year 2020: “Python 3 in CY2020 is a firm commitment, it will be a required upgrade as Python 2 will no longer be supported beyond 2020.”

Many GIS tools similarly currently still use Python 2.7. This actually makes a fair bit of sense, especially for the commercial tools, since the Python support in these tools is there primarily to manipulate the application data model and there arguably aren’t any major improvements in Python 3 for that kind of use case as yet, but still some risk of breaking existing scripts if the application updates to Python 3. However, ESRi’s ArcGIS has handled the migration problem by switching to Python 3 in the new ArcGIS product line, sticking with Python 2 in the ArcGIS Desktop/Server/Engine product lines, and providing tools to assist with migration between them.

From a web security perspective, Python 2’s standard library is already a relic. Anyone doing web programming in Python 2 that touches the public internet should not be relying solely on the standard library, since it’s too old, and instead should be relying more on third party modules from PyPI. For example, instead of the SSL module, use Requests.

For the open source applications when Python 2 is currently seen as a “good enough” scripting engine, the likely main driver for Python 3 scripting support is likely to be commercial distribution vendors looking to drop commercial Python 2 runtime support - the up front investment in application level Python 3 support would be intended to pay off in the form of reduced long term sustaining engineering costs at the language runtime level.

That said, the Python 3 reference interpreter also offers quite a few new low level configuration options that let embedding applications control the memory allocators used, monitor and control all bytecode execution, and various other improvements to the runtime embedding functionality, so the natural incentives for application developers to migrate are starting to accumulate, which means we may see more activity on that front as the 2020 date for the end of community support of the Python 2 series gets closer.

Why have Linux distributions taken so long to migrate away from Python 2?

With RHEL 8 and Ubuntu LTS 18.04 now using Python 3.6 for their primary system Python installation, and Debian 10 and SLES 15 offering Python 3 support alongside Python 2, it’s reasonable to wonder why it took more than a decade for Linux distributions to reach a point where their migration away from the Python 2.x series is nearing completion.

While part of the problem was simply the sheer amount of code to be reviewed and potentially updated, the core of the delay was the issues discussed in the answer to What’s up with POSIX systems in Python 3?: with Python 3’s internal text model now being different from the one in POSIX, the historical mechanisms for interacting with POSIX systems from Python 2.x didn’t quite work right in earlier Python 3.x releases, and that situation needed to be improved before the interpreter would once again be fully suitable for use in core operating system components.

That situation was largely resolved with the implementation of both PEP 538 (locale coercion for the legacy C locale) and PEP 540 (UTF-8 mode) in CPython 3.7. The system Python installation in RHEL 8 actually includes a backport of the PEP 538 locale coercion behaviour, as per the relevant section in the PEP.

(Note: Red Hat and Canonical have both contributed significantly to the broad adoption of Python 3 as a platform, migrating not only their own projects and applications, but also often investing time in adding Python 3 support to the open source libraries that they depend on.)

Why did Apple decide not to ship Python 3?

The short answer is: they decided not to ship Python, and several other scripting languages, at all (with the OS, for end user use), and we believe that decision had nothing to do with the 2-to-3 transition.

Unlike the open source Linux distributors, Apple doesn’t generally make the rationale for their engineering decisions public. The one thing we do know in this case is that in the macOS 10.15 release notes, Apple have declared all of the open source language runtimes that they currently ship (including Python, Perl, and Ruby) to be deprecated, and have advised application developers that require those runtimes to bundle their own interpreter with their application. The macOS 10.15 release notes also explicitly advise against using the macOS system installation of Python 2.7 for any purpose.

So while it’s possible that the creation of Python 3 was one of the factors that contributed to this eventual outcome, the product management decision within Apple appears to have been “We will not actively promote or encourage any developer experience for our platforms that we don’t largely control” (specifically, Obective-C and Swift). They’re hardly unique amongst platform developers in that regard - there were major battles for control between Sun and Microsoft over Java that contributed to Microsoft’s eventual creation of the C# programming language, and the later fights between Oracle and Google (also over Java), presumably had some impact on the latter’s decision to embrace Kotlin as their preferred language for Android app development.

(Note: Linux distribution vendors also advise against using the system Python runtimes to run your own custom applications, and RHEL 8 installs the system Python in a way that means it isn’t available to users by default.)

But uptake is so slow, doesn’t this mean Python 3 is failing as a platform?

While the frequency with which this question is asked has declined markedly since 2015 or so, a common thread I saw running through such declarations of “failure” was people not quite understanding the key questions where the transition plan was aiming to change the answers. These are the three key questions:

  • “I am interested in learning Python. Should I learn Python 2 or Python 3?”
  • “I am teaching a Python class. Should I teach Python 2 or Python 3?”
  • “I am an experienced Python developer starting a new project. Should I use Python 2 or Python 3?”

At the start of the migration, the answer to all of those questions was obviously “Python 2”. By August 2015, I considered the answer to be “Python 3.4, unless you have a compelling reason to choose Python 2 instead”. Possible compelling reasons included “I am using existing course material that was written for Python 2”, “I am teaching the course to maintainers of an existing Python 2 code base”, “We have a large in-house collection of existing Python 2 only support libraries we want to reuse” and “I only use the version of Python provided by my Linux distro vendor and they currently only support Python 2” (in regards to that last point, we realised early that the correct place to tackle it was on the vendor side, and by late 2014, all of Canonical, Red Hat, and SUSE had commercial Python 3 offerings available).

Note the question that isn’t on the list: “I have a large Python 2 application which is working well for me. Should I migrate it to Python 3?”.

While OpenStack and some key Linux distributions have answered “Yes”, for most organisations the answer to that question remained “No” for several years while companies like Canonical, Red Hat, Facebook, Google, Dropbox, and others worked to migrate their own systems, and published the related migration tools (such as the pylint --py3k option, and the work that has gone into the mypy and typeshed projects to allow Python 3 static type analysis to be applied to Python 2 programs prior to attempting to migrate them).

While platform effects are starting to shift even the answer to that question towards “Maybe” for the majority of users (and Python 3 gives Python 2 a much nicer exit strategy to a newer language than COBOL ever did), the time frame for that change is a lot longer than the five years that was projected for changing the default choice of Python version for green field projects.

That said, reducing or eliminating any major remaining barriers to migration is an ongoing design goal for Python 3.x releases, at least in those cases where the change is also judged to be an internal improvement within Python 3 (for example, the restoration of binary interpolation support in Python 3.5 was motivated not just by making it easier to migrate from Python 2, but also to make certain kinds of network programming and other stream processing code easier to write in Python 3).

In the earlier days of the Python 3 series, several of the actions taken by the core development team were actually deliberately designed to keep conservative users away from Python 3 as a way of providing time for the ecosystem to mature.

Now, if Python 3 had failed to offer a desirable platform, nobody would have cared about this in the slightest. Instead, what we saw was the following:

  • people coming up with great migration guides and utilities independently of the core development team. While six was created by a core developer (Benjamin Peterson), and lib2to3 and the main porting guides are published by the core development team, python-modernize was created by Armin Ronacher (creator of Jinja2 and Flask), while python-future was created by Ed Schofield based on that earlier work. Lennart Regebro has also done stellar work in creating an in-depth guide to porting to Python 3
  • Linux distributions aiming to make Python 2 an optional download and have only Python 3 installed by default
  • commercial Python redistributors and public cloud providers ensuring that Python 3 was included as one of their supported offerings
  • customers approaching operating system vendors and asking for assistance in migrating large proprietary code bases from Python 2 to Python 3
  • more constrained plugin ecosystems that use an embedded Python interpreter (like Blender, gcc, and gdb) either adding Python 3 support, or else migrating entirely from Python 2 to 3
  • developers lamenting the fact that they wanted to use Python 3, but were being blocked by various dependencies being missing, or because they previously used Python 2, and needed to justify the cost of migration to their employer
  • library and framework developers that hadn’t already added Python 3 support for their own reasons being strongly encouraged by their users to offer it (sometimes in the form of code contributions, other times in the form of tracker issues, mailing list posts and blog entries)
  • interesting new implementations/variants like MyPy and MicroPython taking advantage of the removal of legacy behaviour to target the leaner Python 3 language design rather than trying to handle the full backwards compatibility implications of implementing Python 2
  • developers complaining that the core development team wasn’t being aggressive enough in forcing the community to migrate promptly rather than allowing the migration to proceed at its own pace (!)

That last case only appeared around 2014 (~5 years into the migration), and the difference in perspective appears to be an instance of the classic early adopter/early majority divide in platform adoption. The deliberately gentle migration plan was (and is) for the benefit of the late adopters that drive Python’s overall popularity, not the early adopters that make up both the open source development community and the (slightly) broader software development blogging community.

It’s important to keep in mind that Python 2.6 (released October 2008) has long stood as one of the most widely deployed versions of Python, purely through being the system Python in Red Hat Enterprise Linux 6 and its derivatives, and usage of Python 2.4 (released November 2004) remained non-trivial through to at least March 2017 for the same reason with respect to Red Hat Enterprise Linux 5.

I expect there is a similar effect from stable versions of Debian, Ubuntu LTS releases and SUSE Linux Enterprise releases, but (by some strange coincidence) I’m not as familiar with the Python versions and end-of-support dates for those as I am with those for the products sold by my employer ;)

If we weren’t getting complaints from the early adopter crowd about the pace of the migration, then I would have been worried (as it would have indicated they had abandoned Python entirely and moved on to something else).

The final key point to keep in mind is that the available metrics on Python 3 adoption are quite limited, and that remains true regardless of whether we think the migration is going well or going poorly. The three main quantitative options are to analyse user agents on the Python Package Index, declarations of Python 3 support on PyPI and binary installer downloads for Mac OS X and Windows from python.org.

The first of those remains heavily dominated by existing Python 2 users, but the trend in Python 3 usage is still upwards. These metrics are stored as a public data set in Google Big Query, and this post goes over some of the queries that are possible with the available data. The records are incomplete prior to June 2016, but running the query in April 2017 shows downloads from Python 3 clients increasing from around 7% of approximately 430 million downloads in June 2016 to around 12% of approximately 720 million downloads in March 2017.

The second is based on publisher provided package metadata rather than automated version compatibility checking.

Of the top 360 most downloaded packages, 100% offer Python 3 support. Again, the trend is upwards (the number in 2014 was closer to 70%), and I’m not aware of anyone adding Python 3 support, and then removing it as imposing too much maintenance overhead.

The last metric reached the point where Python 3 downloads outnumbered Python 2 downloads (54% vs 46%) back in 2013. Those stats needs to be collected manually from the www.python.org server access logs, so I don’t have anything more recent than that.

The Python 3 ecosystem is definitely still the smaller of the two as of April 2017 (by a non-trivial margin), but users that start with Python 3 are able to move parts of their applications and services to Python 2 readily enough if the need arises, and hopefully with a clear idea of which parts of Python 2 are the modern recommended parts that survived the transition to Python 3, and which parts are the legacy cruft that only survives in the latest Python 2.x releases due to backwards compatibility concerns.

For the inverse question relating to the concern that the existing migration plan is too aggressive, see Aren’t you abandoning Python 2 users?.

Is the ultimate success of Python 3 as a platform assured?

Yes, its place as the natural successor to the already dominant Python 2 platform is now assured. Commercial support has long been available from multiple independent vendors, the vast majority of the core components from the Python 2 ecosystem are already available, and the combination of the Python 3.5+ releases and Python’s uptake in the education and data analysis sectors provide assurance of a steady supply of both Python developers, and work for those developers (in the 2016 edition of IEEE’s survey of programming languages, Python was 3rd, trailing only Java and C, overtaking C++ relative to its 2015 position, and both C++ and C# relative to the initial 2014 survey).

For me, with my Linux-and-infrastructure-software bias, the tipping point has been Ubuntu and Fedora successfully making the transition to only having Python 3 in their default install. That change means that a lot of key Linux infrastructure software is now Python 3 compatible, as well as representing not only a significant statement of trust in the Python 3 platform by a couple of well respected organisations (Canonical and Red Hat), but also a non-trivial investment of developer time and energy in performing the migration. This change will also mean that Python 3 will be more readily available than Python 2 on those platforms in the future, and hence more likely to be used as the chosen language variant for Python utility scripts, and hence increase the attractiveness of supporting Python 3 for library and framework developers.

A significant milestone only attained over 2016 and 2017 has been the three largest public cloud providers (Amazon Web Services, Microsoft Azure, and Google Cloud Platform) ensuring that Python 3 is a fully supported development option on their respective platforms, adding to the support already previously available in platforms like Heroku and OpenShift Online.

Specifically in the context of infrastructure, I also see the ongoing migration of OpenStack components from being Python 2 only applications to being Python 3 compatible as highly significant, as OpenStack is arguably one of the most notable Python projects currently in existence in terms of spreading awareness outside the traditional open source and academic environs. In particular, as OpenStack becomes a Python 3 application, then the plethora of regional cloud provider developers and hardware vendor plugin developers employed to work on it will all be learning Python 3 rather than Python 2.

A notable early contribution to adoption has been the education community’s staunch advocacy for the wider Python community to catch up with them in embracing Python 3, rather than confusing their students with occasional recommendations to learn Python 2 directly, rather than learning Python 3 first.

As far as the scientific community goes, they were amongst the earliest adopters of Python 3 - I assume the reduced barriers to learnability were something they appreciated, and the Unicode changes were not a problem that caused them significant trouble.

I think the web development community has certainly had the roughest time of it. Not only were the WSGI update discussions long and drawn out (and as draining as any standards setting exercise), resulting in a compromise solution that at least works but isn’t simple to deal with, but they’re also the most directly affected by the additional challenges faced when working directly with binary data in Python 3. However, even in the face of these issues, the major modern Python web frameworks, libraries and database interfaces do support Python 3, and the return of binary interpolation support in Python 3.5 addressed some of the key concerns raised by the developers of the Twisted networking library.

The adoption of asyncio as the standard framework for asynchronous IO and the subsequent incorporation of first class syntactic support for coroutines have also helped the web development community resolve a long standing issue with a lack of a standard way for web servers and web frameworks to communicate regarding long lived client connections (such as those needed for WebSockets support), providing a clear incentive for migration to Python 3.3+ that didn’t exist with earlier Python 3 versions.

Python 3 is meant to make Unicode easier, so why is <X> harder?

As of 2015, the Python community as a whole had had more than 15 years to get used to the Python 2 way of handling Unicode. By contrast, for Python 3, we’d only had a production ready release available for just over 5 years, and since some of the heaviest users of Unicode are the web framework developers, and they’d only had a stable WSGI target since the release of 3.2, you could drop that down to just under 5 years of intensive use by a wide range of developers with extensive practical experiencing in handling Unicode (we have some excellent Unicode developers in the core team, but feedback from a variety of sources is invaluable for a change of this magnitude).

That feedback has already resulted in major improvements in the Unicode support for the Python 3.2, 3.3, 3.4, 3.5, 3.6, and 3.7 releases. With the codecs and email modules being brought into line, the Python 3.4 release was the first one where the transition felt close to being “done” to me in terms of coping with the full implications of a strictly enforced distinction between binary and text data in the standard library, while Python 3.5 revisited some of the earlier design decisions of the Python 3 series and changed some of them based on several years of additional experience. Python 3.6 brought some major changes to the way binary system APIs are handled on Windows, and changes of similar scope in 3.7 improved support on non-Windows systems.

While I’m optimistic that the system boundary handling changes proposed for Python 3.7 will resolve the last of the major issues, I nevertheless expect that feedback process will continue throughout the 3.x series, since “mostly done” and “done” aren’t quite the same thing, and attempting to closely integrate with POSIX systems that may be using ASCII incompatible encodings while using a text model with strict binary/text separation hasn’t really been done before at Python’s scale (the JVM is UTF-16 based, but bypasses most OS provided services, while other tools often choose the approach of just assuming that all bytes are UTF-8 encoded, regardless of what the underlying OS claims).

In addition to the cases where blurring the binary/text distinction really did make things simpler in Python 2, we’re also forcing even developers in strict ASCII-only environments to have to care about Unicode correctness, or else explicitly tell the interpreter not to worry about it. This means that Python 2 users that may have previously been able to ignore Unicode issues may need to account for them properly when migrating to Python 3.

I’ve written more extensively on both of these topics in Python 3 and ASCII Compatible Binary Protocols and Processing Text Files in Python 3, while PEP 538 and PEP 540 go into detail on the system boundary changes now being proposed for Python 3.7.

Python 3 is meant to fix Unicode, so why is <X> still broken?

The long march from the early assumptions of Anglocentric ASCII based computing to a more global Unicode based future is still ongoing, both for the Python community, and the computing world at large. Computers are still generally much better at dealing with English and other languages with similarly limited character sets than they are with the full flexibility of human languages, even the subset that has been pinned down to a particular binary representation thanks to the efforts of the Unicode Consortium.

While the changes to the core text model in Python 3 did implicitly address many of the Unicode issues affecting Python 2, there are still plenty of Unicode handling issues that require their own independent updates. One recurring problem is that many of these are relatively easy to work around (such as by using a graphical environment rather than the default interactive interpreter to avoid the command line limitations on Windows), but comparatively hard to fix properly (and then get agreement that the proposed fix is a suitable one).

The are also more specific questions covering the state of the WSGI middleware interface for web services, and the issues that can arise when dealing with What’s up with POSIX systems in Python 3?.

Is Python 3 a better language to teach beginning programmers?

I believe so, yes, especially if teaching folks that aren’t native English speakers. However, I also expect a lot of folks will still want to continue on and learn Python 2 even if they learn Python 3 first - I just think that for people that don’t already know C, it will be easier to start with Python 3, and then learn Python 2 (and the relevant parts of C) in terms of the differences from Python 3 rather than learning Python 2 directly and having to learn all those legacy details at the same time as learning to program in the first place.

Note

This answer was written for Python 3.5. For Python 3.6, other potential benefits in teaching beginners include the new f-string formatting syntax, the secrets module, the ability to include underscores to improve the readability of long numeric literals, and the ordering of arbitrary function keyword arguments reliably matching the order in which they’re supplied to the function call.

As noted above, Python 2 has some interesting quirks due to its C heritage and the way the language has evolved since Guido first created Python in 1991. These quirks then have to be taught to every new Python user so that they can avoid them. The following are examples of such quirks that are easy to demonstrate in an interactive session (and resist the temptation to point out that these can all be worked around - for teaching beginners, it’s the default behaviour that matters, not what experts can instruct the interpreter to do with the right incantations elsewhere in the program).

You can get unexpected encoding errors when attempting to decode values and unexpected decoding errors when attempting to encode them, due to the presence of decode and encode methods on both str and unicode objects, but more restrictive input type expectations for the underlying codecs that then trigger the implicit ASCII based encoding or decoding:

>>> u"\xe9".decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
>>> b"\xe9".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

Python 2 has a limited and inconsistent understanding of character sets beyond those needed to record English text:

>>> è = 1
  File "<stdin>", line 1
    è = 1
    ^
SyntaxError: invalid syntax
>>> print("è")
è

That second line usually works in the interactive interpreter, but won’t work by default in a script:

$ echo 'print("è")' > foo.py
$ python foo.py
  File "foo.py", line 1
SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

The handling of Unicode module names is also inconsistent:

$ echo "print(__name__)" > è.py
$ python -m è
__main__
$ python -c "import è"
  File "<string>", line 1
    import è
           ^
SyntaxError: invalid syntax

Beginners are often surprised to find that Python 2 can’t do basic arithmetic correctly:

>>> 3 / 4
0

Can be bemused by the fact that Python 2 interprets numbers strangely if they have a leading zero:

>>> 0777
511

And may also eventually notice that Python 2 has two different kinds of integer:

>>> type(10) is type(10**100)
False
>>> type(10) is type(10L)
False
>>> 10
10
>>> 10L
10L

The print statement is weirdly different from normal function calls:

>>> print 1, 2, 3
1 2 3
>>> print (1, 2, 3)
(1, 2, 3)
>>> print 1; print 2; print 3
1
2
3
>>> print 1,; print 2,; print 3
1 2 3
>>> import sys
>>> print >> sys.stderr, 1, 2, 3
1 2 3

And the exec statement also differs from normal function calls like eval and execfile:

>>> d = {}
>>> exec "x = 1" in d
>>> d["x"]
1
>>> d2 = {"x":[]}
>>> eval("x.append(1)", d2)
>>> d2["x"]
[1]
>>> with open("example.py", "w") as f:
...     f.write("x = 1\n")
...
>>> d3 = {}
>>> execfile("example.py", d3)
>>> d3["x"]
1

The input builtin has some seriously problematic default behaviour:

>>> input("This is dangerous: ")
This is dangerous: __import__("os").system("echo you are in trouble now")
you are in trouble now
0

The open builtin doesn’t handle non-ASCII files correctly (you have to use codecs.open instead), although this often isn’t obvious on POSIX systems (where passing the raw bytes through the way Python 2 does often works correctly).

You need parentheses to catch multiple exceptions, but forgetting that is an error that passes silently:

>>> try:
...   1/0
... except TypeError, ZeroDivisionError:
...     print("Exception suppressed")
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> try:
...     1/0
... except (TypeError, ZeroDivisionError):
...     print("Exception suppressed")
...
Exception suppressed

And if you make a mistake in an error handler, you’ll lose the original error:

>>> try:
...     1/0
... except Exception:
...     logging.exception("Something went wrong")
...
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
NameError: name 'logging' is not defined

Python 2 also presents users with a choice between two relatively unattractive alternatives for calling up to a parent class implementation from a subclass method:

class MySubclass(Example):

    def explicit_non_cooperative(self):
        Example.explicit_non_cooperative(self)

    def explicit_cooperative(self):
        super(MySubclass, self).explicit_cooperative()

List comprehensions are one of Python’s most popular features, yet they can have surprising side effects on the local namespace:

>>> i = 10
>>> squares = [i*i for i in range(5)]
>>> i
4

Python 2 is still a good language despite these flaws, but users that are happy with Python 2 shouldn’t labour under the misapprehension that the language is perfect. We have made mistakes, and Python 3 came about because Guido and the rest of the core development team finally became tired of making excuses for those limitations, and decided to start down the long road towards fixing them instead.

All of the above issues have been addressed by backwards incompatible changes in Python 3. Once we had made that decision, then adding other new features twice (once to Python 3 and again to Python 2) imposed significant additional development effort, although we did do so for a number of years (the Python 2.6 and 2.7 releases were both developed in parallel with Python 3 releases, and include many changes originally created for Python 3 that were backported to Python 2 since they were backwards compatible and didn’t rely on other Python 3 only changes like the new, more Unicode friendly, IO stack).

I’ll give several examples below of how the above behaviours have changed in Python 3 releases, up to and including Python 3.6 (since that’s the currently released version).

In Python 3, the codec related builtin convenience methods are strictly reserved for use with text encodings. Accordingly, text objects no longer even have a decode method, and binary types no longer have an encode method:

>>> u"\xe9".decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> b"\xe9".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'

In addition to the above changes, Python 3.4 included additional changes to the codec system to help with more gently easing users into the idea that there are different kinds of codecs, and only some of them are text encodings. It also updates many of the networking modules to make secure connections much simpler.

Python 3 also has a much improved understanding of character sets beyond English:

>>> è = 1
>>> è
1

And this improved understanding extends to the import system:

$ echo "print(__name__)" > è.py
$ python3 -m è
__main__
$ python3 -c "import è"
è

Python 3 has learned how to do basic arithmetic, replaces the surprising C notation for octal numbers with the more explicit alternative supported since Python 2.6 and only has one kind of integer:

>>> 3 / 4
0.75
>>> 0777
  File "<stdin>", line 1
    0777
       ^
SyntaxError: invalid token
>>> 0o777
511
>>> type(10) is type(10**100)
True
>>> 10
10
>>> 10L
  File "<stdin>", line 1
    10L
      ^
SyntaxError: invalid syntax

print is now just an ordinary function that accepts keyword arguments, rather than having its own custom (and arcane) syntax variations (note that controlling the separator between elements is a feature that requires preformatting of the string to be printed in Python 2 but was trivial to add direct support for when print was converted to an ordinary builtin function rather than being a separate statement):

>>> print 1, 2, 3
  File "<stdin>", line 1
    print 1, 2, 3
          ^
SyntaxError: invalid syntax
>>> print(1, 2, 3)
1 2 3
>>> print((1, 2, 3))
(1, 2, 3)
>>> print(1); print(2); print(3)
1
2
3
>>> print(1, 2, 3, sep="\n")
1
2
3
>>> print(1, end=" "); print(2, end=" "); print(3)
1 2 3
>>> import sys
>>> print(1, 2, 3, file=sys.stderr)
1 2 3

exec is now more consistent with execfile:

>>> d = {}
>>> exec("x=1", d)
>>> d["x"]
1

Converting print and exec to builtins rather than statements means they now also work natively with utilities that require real function objects (like map and functools.partial), they can be replaced with mock objects when testing and they can be more readily substituted with alternative interfaces (such as replacing raw print statements with a pretty printer or a logging system). It also means they can be passed to the builtin help function without quoting, the same as other builtins.

The input builtin now has the much safer behaviour that is provided as raw_input in Python 2:

>>> input("This is no longer dangerous: ")
This is no longer dangerous: __import__("os").system("echo you have foiled my cunning plan")
'__import__("os").system("echo you have foiled my cunning plan")'

The entire IO stack has been rewritten in Python 3 to natively handle Unicode and (in the absence of system configuration errors), to favour UTF-8 by default rather than ASCII. Unlike Python 2, open() in Python 3 natively supports encoding and errors arguments, and the tokenize.open() function automatically handles Python source file encoding cookies.

Failing to trap an exception is no longer silently ignored:

>>> try:
...     1/0
... except TypeError, ZeroDivisionError:
  File "<stdin>", line 3
    except TypeError, ZeroDivisionError:
                    ^
SyntaxError: invalid syntax

And most errors in exception handlers will now still report the original error that triggered the exception handler:

>>> try:
...     1/0
... except Exception:
...     logging.exception("Something went wrong")
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
NameError: name 'logging' is not defined

Note that implicit exception chaining is the thing I miss most frequently when working in Python 2, and the point I consider the single biggest gain over Python 3 when migrating existing applications - there are few things more irritating when debugging a rare production failure than losing the real problem details due to a secondary failure in a rarely invoked error path.

While you probably don’t want to know how it works internally, Python 3 also provides a much cleaner API for calling up to the parent implementation of a method:

class MySubclass(Example):

    def implicit_cooperative(self):
        super().implicit_cooperative()

And, like generator expressions in both Python 2 and Python 3, list comprehensions in Python 3 no longer have any side effects on the local namespace:

>>> i = 10
>>> squares = [i*i for i in range(5)]
>>> i
10

The above improvements are all changes that couldn’t be backported to a hypothetical Python 2.8 release, since they’re backwards incompatible with some (but far from all) existing Python 2 code, mostly for obvious reasons. The exception chaining isn’t obviously backwards incompatible, but still can’t be backported due to the fact that handling the implications of creating a reference cycle between caught exceptions and the execution frames referenced from their tracebacks involved changing the lifecycle of the variable named in an “as” clause of an exception handler (to break the cycle, those names are automatically deleted at the end of the relevant exception handler in Python 3 - you now need to bind the exception to a different local variable name in order to keep a valid reference after the handler has finished running). The list comprehension changes are also backwards incompatible in non-obvious ways (since not only do they no longer leak the variable, but the way the expressions access the containing scope changes - they’re now full closures rather than running directly in the containing scope).

As documented in PEP 466, the networking security changes were deemed worthy of backporting. In contrast, while it’s perhaps possible to backport the implicit super change, it would need to be separated from the other backwards incompatible changes to the type system machinery (and in that case, there’s no “help improve the overall security of the internet” argument to be made in favour of doing the work).

There are some other notable changes in Python 3 that are of substantial benefit when teaching new users (as well as for old hands), that technically could be included in a Python 2.8 release if the core development chose to create one, but in practice such a release isn’t going to happen. However, folks interested in that idea may want to check out the Tauthon project, which is a Python 2/3 hybrid language that maintains full Python 2.7 compatibility while backporting backwards compatible enhancement from the Python 3 series.

PEP 3151 means that Python 3.3+ has a significantly more sensible system for catching particular kinds of operating system errors. Here’s the race condition free way to detect a missing file in Python 2.7:

>>> import errno
>>> try:
...     f = open("This does not exist")
... except IOError as err:
...     if err.errno != errno.ENOENT:
...         raise
...     print("File not found")
...
File not found

And here’s the same operation in Python 3.3+:

>>> try:
...     f = open("This does not exist")
... except FileNotFoundError:
...     print("File not found")
...
File not found

(If you’re opening the file for writing, then you can use exclusive mode to prevent race conditions without using a subdirectory - Python 2 has no equivalent. There are many other cases where Python 3 exposes operating system level functionality that wasn’t broadly available when the feature set for Python 2.7 was frozen in April 2010).

Another common complaint with Python 2 is the requirement to use empty __init__.py files to indicate a directory is a Python package, and the complexity of splitting a package definition across multiple directories. By contrast, here’s an example of how to split a package across multiple directories in Python 3.3+ (note the lack of __init__.py files). While technically this can be backported, the implementation depends on the new pure Python implementation of the import system, which in turn depends on the Unicode friendly IO stack in Python 3, so backporting it is far from trivial:

$ mkdir -p dir1/nspkg
$ mkdir -p dir2/nspkg
$ echo 'print("Imported submodule A")' > dir1/nspkg/a.py
$ echo 'print("Imported submodule B")' > dir2/nspkg/b.py
$ PYTHONPATH=dir1:dir2 python3 -c "import nspkg.a, nspkg.b"
Imported submodule A
Imported submodule B

That layout doesn’t work at all in Python 2 due to the missing __init__.py files, and even if you add them, it still won’t find the second directory:

$ PYTHONPATH=dir1:dir2 python -c "import nspkg.a, nspkg.b"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named nspkg.a
$ touch dir1/nspkg/__init__.py
$ touch dir2/nspkg/__init__.py
$ PYTHONPATH=dir1:dir2 python -c "import nspkg.a, nspkg.b"
Imported submodule A
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named b

That last actually shows another limitation in Python 2’s error handling since import failures don’t always show the full name of the missing module. That is fixed in Python 3:

$ PYTHONPATH=dir1 python3 -c "import nspkg.a, nspkg.b"
Imported submodule A
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'nspkg.b'

That said: Eric Snow has now backported the Python 3.4 import system to Python 2.7 as importlib2. I’m aware of at least one large organisation using that in production and being quite happy with the results :)

Python 3.3 also included some minor improvements to the error messages produced when functions and methods are called with incorrect arguments.

Out of the box, why is Python 3 better than Python 2?

The feature set for Python 2.7 was essentially locked in April 2010 with the first beta release. Since then, with a very limited number of exceptions related to network security, the Python core development team have only been adding new features directly to the Python 3 series. These new features are informed both by our experience with Python 3 itself, as well as with our ongoing experience working with Python 2 (as they’re still very similar languages).

As Python 2 is a mature, capable language, with a rich library of support modules available from the Python Package Index (including many backports from the Python 3 standard library), there’s no one universally important feature that will provide a compelling argument to switch for existing Python 2 users. Of necessity, existing Python 2 users are those who didn’t find the limitations of Python 2 that lead to the creation of Python 3 particularly problematic. It is for the benefit of these users that Python 2 continues to be maintained.

For new users of Python however, Python 3 represents years of additional work above and beyond what was included in the Python 2.7 release. Features that may require third party modules, or simply not be possible at all in Python 2, are provided by default in Python 3. This answer doesn’t attempt to provide an exhaustive list of such features, but does aim to provide an illustrative overview of the kinds of improvements that have been made. The What’s New guides for the Python 3 series (especially the 3.3+ releases that occurred after the Python 2 series was placed in long term maintenance) provide more comprehensive coverage.

While I’ve tried to just hit some highlights in this list, it’s still rather long. The full What’s New documents are substantially longer.

Note

This answer was written for Python 3.5. For Python 3.6, some other notable enhancements include the new f-string formatting syntax, the secrets module, the ability to include underscores to improve the readability of long string literals, changes to preserve the order of class namespaces and function keyword arguments, type hints for named variables, and more.

Some changes that are likely to affect most projects are error handling related:

  • the exception hierarchy for operating system errors is now based on what went wrong, rather than which module detected the failure (see PEP 3151 for details).
  • bugs in error handling code no longer hide the original exception (which can be a huge time saver when it happens to hard to reproduce bugs)
  • by default, if the logging system is left unconfigured, warnings and above are written to sys.stderr, while other events are ignored
  • the codec system endeavours to ensure the codec name always appears in the reported error message when the underlying call fails
  • the error messages from failed argument binding now do a much better job of describing the expected signature of the function
  • the socket module takes advantage of the new enum support to include constant names (rather than just numeric values) in the error message output
  • starting in Python 3.5, all standard library modules making system calls should handle EINTR automatically

Unicode is more deeply integrated into the language design, along with a clearer separation between binary and text data:

  • the open() builtin natively supports decoding of text files (rather than having to use codecs.open() instead)
  • the bytes type provides locale independent manipulation of binary data that may contain ASCII segments (the Python 2 str type has locale dependent behaviour for some operations)
  • the codec system has been separated into two tiers. The str.encode(), bytes.decode() and bytearray.decode() methods provide direct access to Unicode text encodings, while the codecs module provides general access to all available codecs, including binary->binary and text->text transforms (in Python 2, all three kinds can be accessed through the convenience methods on the builtin types, creating ambiguity as to the expected return types of the affected methods)
  • data received from the operating system is automatically decoded to text whenever possible (this does cause integration issues in some cases when the OS provides incorrect configuration data, but otherwise allows applications to ignore more cross-platform differences in whether OS APIs natively use bytes or UTF-16)
  • identifiers and the import system are no longer limited to ASCII text (allowing non-English speakers to use names in their native languages when appropriate)
  • Python 3 deliberately has no equivalent to the implicit ASCII based decoding that takes place in Python 2 when an 8-bit str object encounters a unicode object (note that disabling this implicit conversion in Python 2, while technically possible, is not typically feasible, as turning it off breaks various parts of the standard library)
  • Python 3.3+ now correctly handles code points outside the basic multilingual plane without needing to use 4 bytes per code point for all Unicode data (as Python 2 does)

A few new debugging tools are also provided out of the box:

  • faulthandler allows the generation of Python tracebacks for segmentation faults and threading deadlocks (including a -X faulthandler command line option to debug arbitrary scripts)
  • tracemalloc makes it possible to track where objects were allocated and obtain a traceback summary for those locations (this relies on the dynamic memory allocator switching feature added in Python 3.4 and hence cannot be backported to Python 2 without patching the interpreter and building from source
  • the gc module now provides additional introspection and hook APIs

The concurrency support has been improved in a number of ways:

  • The native coroutine syntax added in Python 3.5 is substantially more approachable than the previous “generators-as-coroutines” syntax (as it avoids triggering iterator based intuitions that aren’t actually helpful in the coroutine case)
  • asyncio (and the supporting selectors module) provides greatly enhanced native support for asynchronous IO
  • concurrent.futures provides straightforward support for dispatching work to separate working processes or threads
  • multiprocessing is far more configurable (including the option to avoid relying on os.fork on POSIX systems, making it possible to avoid the poor interactions with between threads and os.fork, while still using both multiple processes and threads)
  • the CPython Global Interpreter Lock has been updated to switch contexts based on absolute time intervals, rather than by counting bytecode execution steps (context switches will still occur between bytecode boundaries)

For data analysis use cases, there’s one major syntactic addition:

  • Python 3.5 added a new binary operator symbol specifically for use in matrix multiplication

Notable additions to the standard library’s native testing capabilities include:

  • the unittest.mock module, previously only available as a third party library
  • a “subtest” feature that allows arbitrary sections of a test to be reported as independent results (including details on what specific values were tested), without having to completely rewrite the test to fit into a parameterised testing framework
  • a new FAIL_FAST option for doctest that requests stopping the doctest at the first failing test, rather than continuing on to run the remaining tests

Performance improvements include:

  • significant optimisation work on various text encodings, especially UTF-8, UTF-16 and UTF-32
  • a significantly more memory efficient Unicode representation, especially compared to the unconditional 4 bytes per code point used in Linux distro builds of Python 2
  • a C accelerator module for the decimal module
  • transparent use of other C accelerator modules where feasible (including for pickle and io)
  • the range builtin is now a memory efficient calculated sequence
  • the use of iterators or other memory efficient representations for various other builtin APIs that previously returned lists
  • dictionary instances share their key storage when possible, reducing the amount of memory consumed by large numbers of class instances
  • the rewritten implementation of the import system now caches directory listings for a brief time rather than blindly performing stat operations for all possible file names, drastically improving startup performance when network filesystems are present on sys.path

Security improvements include:

  • support for “exclusive mode” when opening files
  • support for the directory file descriptor APIs that avoid various symlink based attacks
  • switching the default hashing algorithm for key data types to SIPHash
  • providing an “isolated mode” command line switch to help ensure user settings don’t impact execution of particular commands
  • disabling inheritance of file descriptors and Windows handles by child processes by default
  • new multiprocessing options that avoid sharing memory with child process by avoiding the os.fork system call
  • significant improvements to the SSL module, such as TLS v1.1 and v1.2 support, Server Name Indication support, access to platform certificate stores, and improved support for certificate verification (while these are in the process of being backported to Python 2.7 as part of PEP 466, it is not yet clear when that process will be completed, and those enhancements are already available in Python 3 today)
  • other networking modules now take advantage of many of the SSL module improvements, including making it easier to use the new ssl.create_default_context() to choose settings that default to providing reasonable security for use over the public internet, rather maximising interoperability (but potentially allowing operation in no longer secure modes)
  • the secrets module added in 3.6

Object lifecycle and resource management has also improved significantly:

  • the cyclic garbage collector is now more aggressive in attempting to collect cycles, even those containing __del__ methods. This eliminated some cases where generators could be flagged as uncollectable (and hence effectively leak memory)
  • this means most objects will now have already been cleaned up before the last resort “set module globals to None” step triggers during shutdown, reducing spurious tracebacks when cleanup code runs
  • the new weakref.finalize() API makes it easier to register weakref callbacks without having to worry about managing the lifecycle of the reference itself
  • many more objects in the standard library now support the context management protocol for explicit lifecycle and resource management

Other quality of life improvements include:

  • __init__.py files are no longer needed to declare packages - if no foo/__init__.py file is present, then all directories named foo on sys.path will be automatically scanned for foo submodules
  • the new super builtin makes calling up to base class method implementations in a way that supports multiple inheritance relatively straightforward
  • keyword only arguments make it much easier to add optional parameters to functions in a way that isn’t error prone or hard to read
  • the yield from syntax for delegating to subgenerators and iterators (this is a key part of the asyncio coroutine support)
  • iterable unpacking syntax is now more flexible
  • zipapp for bundling pure Python applications into runnable archives
  • enum for creating enumeration types
  • ipaddress for working with both IPv4 and IPv6 addresses
  • pathlib for a higher level filesystem abstraction than the low level interface provided by os.path
  • statistics for a simple high school level statistics library (mean, median, mode, variance, standard deviation, etc)
  • datetime.timestamp() makes it easy to convert a datetime object to a UNIX timestamp
  • time.get_clock_info() and related APIs provide access to a rich collection of cross platform time measurement options
  • venv provides virtual environment support out of the box, in a way that is better integrated with the core interpreter than is possible in Python 2 with only virtualenv available
  • ensurepip ensures pip is available by default in Python 3.4+ installations
  • memoryview` is significantly more capable and reliable
  • the caching mechanism for pyc files has been redesigned to better accommodate sharing of Python files between multiple Python interpreters (whether different versions of CPython, or other implementation like PyPy and Jython)
  • as part of that change, implicitly compiled bytecode cache files are written to __pycache__ directories (reducing directory clutter) and are ignored if the corresponding source file has been removed (avoiding obscure errors due to stale cached bytecode files)
  • types.SimpleNamespace and types.MappingProxyType are made available at the Python layer
  • improved introspection support, based on the inspect.signature() API, and its integration into pydoc, allowing accurate signature information to be reported for a much wider array of callables than just actual Python function objects
  • defining __eq__ without also defining __hash__ implicitly disables hashing of instances, avoiding obscure errors when such types were added to dictionaries (you now get an error about an unhashable type when first adding an instance, rather than obscure data driven lookup bugs later)
  • ordered comparisons between objects of different types are now disallowed by default (again replacing obscure data driven errors with explicit exceptions)

Some more advanced higher order function manipulation and metaprogramming capabilities are also readily available in Python 3:

  • the functools.partialmethod() function makes it straightforward to do partial function application in a way that still allows the instance object to be supplied later as a positional argument
  • the functools.singledispatch() decorator makes it easy to create generic functions that interoperate cleanly with Python’s type system, including abstract base classes
  • the contextlib.ExitStack class makes it easy to manipulate context managers dynamically, rather than having to rely on explicit use of with statements
  • The new __prepare__ method, and associated functions in the types module makes it possible for metaclasses to better monitor what happens during class body execution (for example, by using an ordered dictionary to record the order of assignments)
  • the updated import system permits easier creation of custom import hooks. In particular, the “source to code” translation step can be overridden, while reusing the rest of the import machinery (including bytecode caching) in a custom import hook
  • the dis.Bytecode API and related functionality makes it easier to work with CPython bytecode

Various improvements in Python 3 also permitted some significant documentation improvements relative to Python 2:

  • as the Python 3 builtin sequences are more compliant with their corresponding abstract base classes, it has proved easier to flesh out their documentation to cover all the additional details that have been introduced since those docs were originally written
  • the final removal of the remnants of the legacy import system in Python 3.3 made it feasible to finally document the import system mechanics in the language reference

While many of these features are available in Python 2 with appropriate downloads from the Python Package Index, not all of them are, especially the various changes to the core interpreter and related systems.

While Python 2 does still have a longer tail of esoteric modules available on PyPI, most popular third party modules and frameworks either support both, have alternatives that support Python 3. or can be relatively easily ported using tools like futurize (part of python-future). The 3to2 project, and the pasteurize tool (also part of python-future) offer options for migrating a pure Python 3 application to the large common subset of Python 2 and Python 3 if a critical Python 2 only dependency is identified, and it can’t be invoked in a separate Python 2 process, or cost effectively ported to also run on Python 3.

With Python 3 software collections available for both Red Hat Enterprise Linux and CentOS, Ubuntu including a fully supported Python 3 stack in its latest LTS release, and Continuum Analytics releasing Anaconda3 (a Python 3 based version of their scientific software distribution), the number of cases where using Python 2 is preferable to using Python 3 is dwindling to those where:

  • for some reason, an application absolutely needs to run in the system Python on Red Hat Enterprise Linux or CentOS (for example, depending on an OS level package that isn’t available from PyPI, or needing a complex binary dependency that isn’t available for the Python 3 software collection and not being permitted to add additional dependencies from outside the distro)
  • the particular application can’t tolerate the current integration issues with the POSIX C locale or the Windows command line in environments that actually need full Unicode support
  • there’s a critical Python 2 only dependency that is known before the project even starts, and separating that specific component out to its own Python 2 process while writing the bulk of the application in Python 3 isn’t considered an acceptable architecture

Is Python 3 more convenient than Python 2 in every respect?

Note

This answer was written for Python 3.5, and has only partially been updated for 3.7 and later. For instance, PEP 461, an accepted proposal to restore support for binary interpolation that is to be source and semantically compatible for the use cases we actually want to support in Python 3, was finalized in Python 3.5.

At this point in time, not quite. Python 3.5 comes much closer to this than Python 3.4 (which in turn was closer than 3.3, etc), but there are still some use cases that are more convenient in Python 2 because it handles them by default, where Python 3 needs some additional configuration, or even separate code paths for things that could be handled by a common algorithm in Python 2.

In particular, many binary protocols include ASCII compatible segments, so it is sometimes convenient to treat them as text strings. Python 2 makes this easier in many cases, since the 8-bit str type blurs the boundary between binary and text data. By contrast, if you want to treat binary data like text in Python 3 in a way that isn’t directly supported by the bytes type, you actually need to convert it to text first, and make conscious decisions about encoding issues that Python 2 largely lets you ignore. I’ve written a separate essay specifically about this point: Python 3 and ASCII Compatible Binary Protocols.

Python 3 also requires a bit of additional up front design work when aiming to handle improperly encoded data. This also has its own essay: Processing Text Files in Python 3.

The Python 3 model also required more complex impedance matching on POSIX platforms, which is covered by a separate question: What’s up with POSIX systems in Python 3?.

Until Python 3.4, the Python 3 codec system also didn’t cleanly handle the transform codecs provided as part of the standard library. Python 3.4 includes several changes to the way these codecs are handled that nudge users towards the type neutral APIs in the codecs module when they attempt to use them with the text encoding specific convenience methods on the builtin types.

Another change that has yet to be fully integrated is the switch to producing dynamic views from the keys, values and items methods of dict objects. It currently isn’t easy to implement fully conformant versions of those in pure Python code, so many alternate mapping implementations in Python 3 don’t worry about doing so - they just produce much simpler iterators, equivalent to the iterkeys, itervalues and iteritems methods from Python 2.

Some of the changes in Python 3 designed for the benefit of larger applications (like the increased use of iterators), or for improved language consistency (like changing print to be a builtin function rather than a statement) are also less convenient at the interactive prompt. map, for example, needs to be wrapped in a list call to produce useful output in the Python 3 REPL, since by default it now just creates an iterator, without actually doing any iteration. In Python 2, the fact it combined both defining the iteration and actually doing the iteration was convenient at the REPL, even though it often resulted in redundant data copying and increased memory usage in actual application code.

Having to type the parentheses when using print is mostly an irritation for Python 2 users that need to retrain their fingers. I’ve personally just trained myself to only use the single argument form (with parentheses) that behaves the same way in both Python 2 and 3, and use string formatting for anything more complex (or else just print the tuple when using the Python 2 interactive prompt). However, I also created a patch that proves it is possible to implement a general implicit call syntax within the constraints of CPython’s parsing rules. Anyone that wishes to do so is free to take that patch and turn it into a full PEP that proposes the addition of a general implicit call syntax to Python 3.5 (or later). While such a PEP would need to address the ambiguity problems noted on the tracker issues (likely by restricting the form of the expression used in an implicit call to only permit unqualified names), it’s notable that the popular IPython interactive interpreter already provides this kind of implicit “autocall” behaviour by default, and many other languages provide a similar “no parentheses, parameters as suffix” syntax for statements that consist of a single function call.

Thanks are due especially to Armin Ronacher for describing several of these issues in fine detail when it comes to the difficulties they pose specifically when writing wire protocol handling code in Python 3. His feedback has been invaluable to me (and others) in attempting to make Python 3 more convenient for wire protocol development without reverting to the Python 2 model that favoured wire protocol development over normal application development (where binary data should exist only at application boundaries and be converted to text or other structured data for internal processing). There’s still plenty of additional improvements that could be made for Python 3.8 and later, though. Possible avenues for improvement previously discussed on python-dev, python-ideas or the CPython issue tracker include:

  • PEP 467 is a draft proposal to clean up some of the legacy of the original Python 3 mutable bytes design. A related change is to better document the tuple-of-ints and list-of-ints behaviour of bytes and bytearray.
  • taking the internal “text encoding” marking system added in Python 3.4 and giving either it or a more general codec type description system a public API for use when developing custom codecs.
  • making it easier to register custom codecs (preferably making use of the native namespace package support added in Python 3.3).
  • introducing a string tainting mechanism that allows strings containing surrogate escaped bytes to be tagged with their encoding assumption and information about where the assumption was introduced. Attempting to process strings with incompatible encoding assumptions would then report both the incompatible assumptions and where they were introduced.
  • creating a “strview” type that uses memoryview to provide a str-like interface to arbitrary binary buffers containing ASCII compatible protocol data.

What’s up with WSGI in Python 3?

The process of developing and updating standards can be slow, frustrating and often acrimonious. One of the key milestones in enabling Python 3 adoption was when the web framework developers and web server developers were able to agree on an updated WSGI 1.1 specification that at least makes it possible to write WSGI applications, frameworks and middleware that support Python 2 and Python 3 from a single source code base, even though it isn’t necessarily easy to do so correctly.

In particular, the Python 2 str type was particular well suited to handling the “data in unknown ASCII compatible encoding” that is common in web protocols, and included in the data passed through from the web server to the application (and vice versa). At this point in time (March 2014), nobody has created a type for Python 3 that is similarly well suited to manipulating ASCII compatible binary protocol data. There certainly wasn’t any such type available for consideration when WSGI 1.1 was standardised in October 2010.

As a result, the “least bad” option chosen for those fields in the Python 3 version of the WSGI protocol was to publish them to the web application as latin-1 decoded strings. This means that applications need to treat these fields as wire protocol data (even though they claim to be text based on their type), encode them back to bytes as latin-1 and then decode them again using the correct encoding (as indicated by other metadata).

The WSGI 1.1 spec is definitely a case of a “good enough” solution winning a battle of attrition. I’m actually hugely appreciative of the web development folks that put their time and energy both into creating the WSGI 1.1 specification and into updating their tools to support it. Like the Python core developers, most of the web development folks weren’t in a position to use Python 3 professionally during the early years of its development, but unlike most of the core developers, the kind of code they write falls squarely into the ASCII compatible binary protocol space where Python 3 still had some significant ground to make up relative to Python 2 in terms of usability (although we’ve also converted our share of such code, just in bringing the standard library up to scratch).

What’s up with POSIX systems in Python 3?

Note

This answer was written for Python 3.5. See PEP 538 and PEP 540 for discussion of some key changes now being considered for Python 3.7.

The fact that the Python 2 text model was essentially the POSIX text model with Unicode support bolted on to the side meant that interoperability between Python 2 and even misconfigured POSIX systems was generally quite straightforward - if the implicit decoding as ASCII never triggered (which was likely for code that only included 8-bit strings and never explicitly decoded anything as Unicode), non-ASCII data would silently pass through unmodified.

One option we considered was to just assume everything was UTF-8 by default, similar to the choice made by the Windows .NET platform, the GNOME GUI toolkit and other systems. However, we decided that posed an unacceptable risk of silently corrupting user’s data on systems that were properly configured to use an encoding other than UTF-8 (this concern was raised primarily by contributors based in Europe and Asia).

This was a deliberate choice of attempting to be compatible with other software on the end user’s system at the cost of increased sensitivity to configuration errors in the environment and differences in default behaviour between environments with different configurations. There are also current technical limitations in the reference interpreter’s startup code that force us to rely on the locale encoding claimed by the operating system on POSIX systems.

PEP 383 added the surrogateescape error handler to cope with the fact that the configuration settings on POSIX systems aren’t always a reliable guide to the actual encoding of the data you encounter. One of the most common causes of problems is the seriously broken default encoding for the default locale in POSIX (due to the age of the ANSI C spec where that default is defined, that default is ASCII rather than UTF-8). Bad default environments and environment forwarding in ssh sessions are another source of problems, since an environment forwarded from a client is not a reliable guide to the server configuration, and if the ssh environment defaults to the C/POSIX locale, it will tell Python 3 to use ASCII as the default encoding rather than something more appropriate.

When surrogateescape was added, we considered enabling it for every operating system interface by default (including file I/O), but the point was once again made that this idea posed serious risks for silent data corruption on Asian systems configured to use Shift-JIS, ISO-2022, or other ASCII-incompatible encodings (European users were generally in a safer position on this one, since Europe has substantially lower usage of ASCII incompatible codecs than Asia does).

This means we’ve been judiciously adding surrogateescape to interfaces as we decide the increase in convenience justifies any increased risk of data corruption. For Python 3.5, this is also being applied to sys.stdin and sys.stdout on POSIX systems that claim that we should be using ascii as the default encoding. Such a result almost certainly indicates a configuration error in the environment, but using ascii+surrogateescape in such cases should make for a more usable result than the current approach of ascii+strict. There’s still some risk of silent data corruption in the face of ASCII incompatible encodings, but the assumption is that systems that are configured with a non-ASCII compatible encoding should already have relatively robust configurations that avoid ever relying on the default POSIX locale.

This is an area where we’re genuinely open to the case being made for different defaults, or additional command line or environment variable configuration options. POSIX is just seriously broken in this space, and we’re having to trade-off user convenience against the risk of silent data corruption - that means the “right answer” is not obvious, and any PEP proposing a change needs to properly account for the rationale behind the current decision (in particular, it has to account for the technical limitations in the startup code that create the coupling to the default locale encoding reported by the operating system, which may require a change on the scale of PEP 432 to actually fix properly).

What changes in Python 3 have been made specifically to simplify migration?

The biggest change made specifically to ease migration from Python 2 was the reintroduction of Unicode literals in Python 3.3 (in PEP 414). This allows developers supporting both Python 2 and 3 in a single code base to easily distinguish binary literals, text literals and native strings, as b"binary" means bytes in Python 3 and str in Python 2, u"text" means str in Python 3.3+ and unicode in Python 2, while "native" means str in both Python 2 and 3.

The restoration of binary interpolation support in Python 3.5 was designed in such as way as to also serve to make a lot of 8-bit string interpolation operations in Python 2 code “just work” in Python 3.5+.

A smaller change to simplify migration was the reintroduction of the non-text encoding codecs (like hex_codec) in Python 3.2, and the restoration of their convenience aliases (like hex) in Python 3.4. The codecs.encode and codecs.decode convenience functions allow them to be used in a single source code base (since those functions have been present and covered by the test suite since Python 2.4, even though they were only added to the documentation recently).

The WSGI update in PEP 3333 also standardised the Python 3 interface between web servers and frameworks, which is what allowed the web frameworks to start adding Python 3 support with the release of Python 3.2.

A number of standard library APIs that were originally either binary only or text only in Python 3 have also been updated to accept either type. In these cases, there is typically a requirement that the “alternative” type be strict 7-bit ASCII data - use cases that need anything more than that are expected to do their encoding or decoding at the application boundary rather than relying on the implicit encoding and decoding provided by the affected APIs. This is a concession in the Python 3 text model specifically designed to ease migration in “pure ASCII” environments - while relying on it can reintroduce the same kind of obscure data driven failures that are seen with the implicit encoding and decoding operations in Python 2, these APIs are at least unlikely to silently corrupt data streams (even in the presence of data encoded using a non-ASCII compatible encoding).

What other changes have occurred that simplify migration?

The original migration guides unconditionally recommended running an applications test suite using the -3 flag in Python 2.6 or 2.7 (to ensure no warnings were generated), and then using the 2to3 utility to perform a one-time conversion to Python 3.

That approach is still a reasonable choice for migrating a fully integrated application that can completely abandon Python 2 support at the time of the conversion, but is no longer considered a good option for migration of libraries, frameworks and applications that want to add Python 3 support without losing Python 2 support. The approach of running 2to3 automatically at install time is also no longer recommended, as it creates an undesirable discrepancy between the deployed code and the code in source control that makes it difficult to correctly interpret any reported tracebacks.

Instead, the preferred alternative in the latter case is now to create a single code base that can run under both Python 2 and 3. The six compatibility library can help with several aspects of that, and the python-modernize utility is designed to take existing code that supports older Python versions and update it to run in the large common subset of Python 2.6+ and Python 3.3+ (or 3.2+ if the unicode literal support in Python 3.3 isn’t needed).

The “code modernisation” approach also has the advantage of being able to be done incrementally over several releases, as failures under Python 3 can be addressed progressively by modernising the relevant code, until eventually the code runs correctly under both versions. Another benefit of this incremental approach is that this modernisation activity can be undertaken even while waiting for other dependencies to add Python 3 support.

More recently, the python-future project was created to assist those developers that would like to primarily write Python 3 code, but would also like to support their software on Python 2 for the benefit of potential (or existing) users that are not themselves able to upgrade to Python 3.

The addition of the pylint --py3k flag was designed to make it easier for folks to ensure that code migrated to the common subset of Python 2 and Python 3 remained there rather than reintroducing Python 2 only constructs.

The landing page for the Python documentation was also switched some time ago to display the Python 3 documentation by default, although deep links still refer to the Python 2 documentation in order to preserve the accuracy of third party references (see PEP 430 for details).

What future changes in Python 3 are expected to further simplify migration?

Most of the changes designed to further simplify migration landed in Python 3.5.

One less obviously migration related aspect of those changes is that the new gradual typing system is designed to allow Python 2 applications to be typechecked as if they were Python 3 applications, and hence many potential porting problems detected even if they’re not covered by tests, or the test suite can’t yet be run on Python 3.

Didn’t you strand the major alternative implementations on Python 2?

Cooperation between the major implementations (primarily CPython, PyPy, Jython, IronPython, but also a few others) has never been greater than it has been in recent years. The core development community that handles both the language definition and the CPython implementation includes representatives from all of those groups.

The language moratorium that severely limited the kinds of changes permitted in Python 3.2 was a direct result of that collaboration - it gave the other implementations breathing room to catch up to Python 2.7. That moratorium was only lifted for 3.3 with the agreement of the development leads for those other implementations. Significantly, one of the most disruptive aspects of the 3.x transition for CPython and PyPy (handling all text as Unicode data) was already the case for Jython and IronPython, as they use the string model of the underlying JVM and CLR platforms.

We have also instituted new guidelines for CPython development which require that new standard library additions be granted special dispensation if they are to be included as C extensions without an API compatible Python implementation.

Python 3 specifically introduced ResourceWarning, which alerts developers when they are relying on the garbage collector to clean up external resources like sockets. This warning is off by default, but switched on automatically by many test frameworks. The goal of this warning is to detect any cases where __del__ is being used to clean up a resource, such as a file or socket or database connection. Such cases are then updated to use either explicit resource management (via a with or try statement) or else switched over to weakref if non-deterministic clean-up is considered appropriate (the latter is quite rare in the standard library). The aim of this effort is specifically to ensure that the entire standard library will run correctly on Python implementations that don’t use refcounting for object lifecycle management.

Finally, Python 3.3 converted the bulk of the import system over to pure Python code so that all implementations can finally start sharing a common import implementation. Some work will be needed from each implementation to work out how to bootstrap that code into the running interpreter (this was one of the trickiest aspects for CPython), but once that hurdle is passed all future import changes should be supported with minimal additional effort.

All that said, there’s often a stark difference in the near term goals of the core development team and the developers for other implementations. Criticism of the Python 3 project has been somewhat vocal from a number of PyPy core developers, and that makes sense when you consider that one of the core aims of PyPy is to provide a better runtime for existing Python applications. However, despite those reservations, PyPy was still the first of the major alternative implementations to support Python 3 (with the initial release of their PyPy3 runtime in June 2014). The initial PyPy3 release targeted Python 3.2 compatibility, but the changes needed to catch up on subsequent Python 3 releases are relatively minor compared to the changes between Python 2 and Python 3, and the PyPy team received a funded development grant from Mozilla to bring PyPy3 at least up to Python 3.5 compatibility. Work also continues on another major compatibility project for PyPy, numpypy, which aims to integrate PyPy with the various components of the scientific Python stack.

Note

The info below on Jython and IronPython is currently quite dated. This section should also be updated to mention the new Python 3 only bytecode-focused implementations targeting the JVM (BeeWare’s VOC), and JavaScript runtimes (BeeWare’s Batavia)

Jython’s development efforts are currently still focused on getting their currently-in-beta Python 2.7 support to a full release, and there is also some significant work happening on JyNI (which, along the same lines as PyPy’s numpypy project, aims to allow the use of the scientific Python stack from the JVM).

The IronPython folks have started working on a Python 3 compatible version, but there currently isn’t a target date for a release. IronClad already supports the use of scientific libraries from IronPython.

One interesting point to note for Jython and IronPython is that the changes to the Python 3 text model bring it more into line with the text models of the JVM and the CLR. This may mean that projects updated to run in the common subset of Python 2 and 3 will be more likely to run correctly on Jython and IronPython, and once they implement Python 3 support, the compatibility of Python 3 only modules should be even better.

Aren’t you abandoning Python 2 users?

We’re well aware of this concern, and have taken what steps we can to mitigate it.

First and foremost is the extended maintenance period for the Python 2.7 release. We knew it would take some time before the Python 3 ecosystem caught up to the Python 2 ecosystem in terms of real world usability. Thus, the extended maintenance period on 2.7 to ensure it continues to build and run on new platforms. While python-dev maintenance of 2.7 was originally slated to revert to security-fix only mode in July 2015, Guido extended that out to 2020 at PyCon 2014. We’re now working with commercial redistributors to help ensure the appropriate resources are put in place to actually meet that commitment. In addition to the ongoing support from the core development team, 2.6 will still be supported by enterprise Linux vendors until at least 2020, while Python 2.7 will be supported until at least 2024.

We have also implemented various mechanisms which are designed to ease the transition from Python 2 to Python 3. The -3 command line switch in Python 2.6 and 2.7 makes it possible to check for cases where code is going to change behaviour in Python 3 and update it accordingly.

The automated 2to3 code translator can handle many of the mechanical changes in updating a code base, and the python-modernize variant performs a similar translation that targets the (large) common subset of Python 2.6+ and Python 3 with the aid of the six compatibility module, while python-future does something similar with its futurize utility.

PEP 414 was implemented in Python 3.3 to restore support for explicit Unicode literals primarily to reduce the number of purely mechanical code changes being imposed on users that are doing the right thing in Python 2 and using Unicode for their text handling.

One outcome of some of the discussions at PyCon 2014 was the pylint --py3k utility to help make it easier for folks to migrate software incrementally and opportunistically, first switching to the common subset running on Python 2.7, before migrating to the common subset on Python 3.

So far we’ve managed to walk the line by persuading our Python 2 users that we aren’t going to leave them in the lurch when it comes to appropriate platform support for the Python 2.7 series, thus allowing them to perform the migration on their own schedule as their dependencies become available, while doing what we can to ease the migration process so that following our lead remains the path of least resistance for the future evolution of the Python ecosystem.

PEP 404 (yes, the choice of PEP number is deliberate - it was too good an opportunity to pass up) was created to make it crystal clear that python-dev has no intention of creating a 2.8 release that backports 2.x compatible features from the 3.x series. After you make it through the opening Monty Python references, you’ll find the explanation that makes it unlikely that anyone else will take advantage of the “right to fork” implied by Python’s liberal licensing model: we had very good reasons for going ahead with the creation of Python 3, and very good reasons for discontinuing the Python 2 series. We didn’t decide to disrupt an entire community of developers just for the hell of it - we did it because there was a core problem in the language design, and a backwards compatibility break was the only way we could find to solve it once and for all.

For the inverse question relating to the concern that the existing migration plan is too conservative, see But uptake is so slow, doesn’t this mean Python 3 is failing as a platform?.

What would it take to make you change your minds about the current plan?

With both the Debian/Ubuntu and Fedora/RHEL/CentOS ecosystems well advanced in their migration plans, public cloud providers offering Python 3 in addition to Python 2, major commercial end users like Facebook, Google and Dropbox migrating, and the PSF’s own major services like python.org and the Python Package Index switching to Python 3, the short answer here is “That’s not going to happen”.

While a crash in general Python adoption might have made us change our minds, Python ended up working its way into more and more niches despite the Python 3 transition, so the only case that could be made is “adoption would be growing even faster without Python 3 in the picture”, which is a hard statement to prove (particularly when we suspect that at least some of the growth in countries where English is not the primary spoken language is likely to be because of Python 3 rather than in spite of it, and that the Python 3 text model is in a much better position to serve as a bridge between the POSIX text model and the JVM and CLR text models than the Python 2 model ever was).

Another scenario that would have made us seriously question our current strategy is if professional educators had told us that Python 2 was a better teaching language, but that didn’t happen - they’re amongst Python 3’s more vocal advocates, encouraging the rest of the community to “just upgrade already”.

Wouldn’t a Python 2.8 release help ease the transition?

In a word: no. In several words: maybe, but at such a high cost, the core development team consider it a much better idea to invest that effort in improving Python 3, migration tools and helping to port libraries and applications (hence why credible contributors can apply to the PSF for a grant to help port key libraries to Python 3, but PSF funding isn’t available for a Python 2.8 release).

The rationale for this proposal appears to be that if backporting Python 3 changes to Python 2.6 and 2.7 was a good idea to help Python 3 adoption, then continuing to do so with a new Python 2.8 release would also be a good idea.

What this misses is that those releases were made during a period when the core development team was still in the process of ensuring that Python 3 was in a position to stand on its own as a viable development platform. We didn’t want conservative users that were currently happy with Python 2 to migrate at that point, as we were still working out various details to get it back to feature parity with Python 2. One of the most notable of those was getting a usable WSGI specification back in 3.2, and another being the restoration of Unicode literals in 3.3 to help with migration from Python 2.

If we hadn’t considered Python 3.2 to be at least back to parity with Python 2.7, that is when we would have decided to continue on to do a Python 2.8 release. We’re even less inclined to do so now that Python 3 has several additional years of feature development under its belt relative to the Python 2 series.

There are parts of the Python 3 standard library that are also useful in Python 2. In those cases, they’re frequently available as backports on the Python Package Index (including even a backport of the new asynchronous IO infrastructure).

There are also various language level changes that are backwards compatible with Python 2.7, and the Tauthon project was started specifically to create a hybrid runtime implementation that expanded the “common subset” of Python 2 & 3 to include those additional features.

However, I think a key point that is often missed in these discussions is that the adoption cycles for new versions of the core Python runtime have always been measured in years due to the impact of stable platforms like Red Hat Enterprise Linux.

Consider the following map of RHEL/CentOS versions to Python versions (release date given is the Python release date, and Python 2.5 was skipped due to RHEL5 being published not long before it was released in September 2006):

  • 4 = 2.3 (first released July 2003)
  • 5 = 2.4 (first released November 2004)
  • 6 = 2.6 (first released October 2008)
  • 7 = 2.7 (first released July 2010)

Now consider these Twisted compatibility requirements (going by the modification dates on the tagged INSTALL file):

  • 10.0 dropped Python 2.3 in March 2010
  • 10.2 dropped Python 2.4 (Windows) in November 2010
  • 12.0 dropped Python 2.4 (non-Windows) in February 2012
  • 12.2 dropped Python 2.5 in August 2012
  • 15.4 dropped Python 2.6 in September 2015

Python 2.6 compatibility was still required more than 7 years after its original release, and didn’t get dropped until well after the first CentOS 7 release was available (not to mention the earlier release of a Python 2.7 SCL).

I believe Twisted has one of the most conservative user bases in the Python community, and I consider this one of the main reasons we see this general pattern of only dropping support for an older release 6-7 years after it was first made available. That’s also why I considered the Twisted developers a key audience for any increases in the scope of single source support in Python 3.5 (and their support for the idea was certainly one of the factors behind the planned return of binary interpolation support).

That’s the way the path to Python 3 will be smoothed at this point: by identifying blockers to migration and knocking them down, one by one. The PSF has helped fund the migration of some key libraries. Barry Warsaw drove a fair amount of Python 3 migration work for Ubuntu at Canonical. Victor Stinner is working hard to encourage and support the OpenStack migration. I have been offering advice and encouragement to Bohuslav Kabrda (the main instigator of Fedora’s migration to Python 3), Petr Viktorin, and other members of Red Hat’s Python maintenance team, as well as helping out with Fedora policy recommendations on supporting parallel Python 2 and 3 stacks (I have actually had very little to do with Red Hat’s efforts to support Python 3 overall, as I haven’t needed to. Things like Python 3 support in Red Hat Software Collections and OpenShift Online happened because other folks at Red Hat made sure they happened). Guido approved the restoration of Unicode literal support after web framework developers realised they couldn’t mask that particular change for their users, and he has also approved the restoration of binary interpolation support. I went through and made the binary transform codecs that had been restored in Python 3.2 easier to discover and use effectively in Python 3.4. R. David Murray put in a lot of time and effort to actually handle Unicode sensibly in the email module, Brett Cannon has been updating the official migration guide based on community feedback, etc, etc (I’m sure I’m missing a bunch of other relevant changes).

Outside of CPython and its documentation, Benjamin Peterson published the six, Lennart Regebro put together his excellent guide for porting, Armin Ronacher created python-modernize and Ed Schofield created python-future. Multiple folks have contributed patches to a wide variety of projects to allow them to add Python 3 support.

Aren’t you concerned Python 2 users will abandon Python over this?

Certainly - a change of this magnitude is sufficiently disruptive that many members of the Python community are legitimately upset at the impact it has had on them.

This is particularly the case for users that had never personally been bitten by the broken Python 2 Unicode model, either because they work in an environment where almost all data is encoded as ASCII text (increasingly uncommon, but still not all that unusual in English speaking countries) or else in an environment where the appropriate infrastructure is in place to deal with the problem even in Python 2 (for example, web frameworks hide most of the problems with the Python 2 approach from their users).

Another category of users are upset that we chose to stop adding new features to the Python 2 series, and have been quite emphatic that attempts to backport features (other than via PyPI modules like unittest2, contextlib2 and configparser) are unlikely to receive significant support from python-dev. As long as they don’t attempt to present themselves as providing official Python releases, we’re not opposed to such efforts - it’s merely the case that (outside a few specific exceptions like PEP 466) we aren’t interested in doing them ourselves, and are unlikely to devote significant amounts of time to assisting those that are interested.

A third category of user negatively affected by the change are those users that deal regularly with binary data formats and had mastered the idiosyncrasies of the Python 2 text model to the point where writing correct code using that model was effortless. The kinds of hybrid binary-or-text APIs that the str type made easy in Python 2 can be relatively awkward to write and maintain in Python 3 (or in the common subset of the two languages). While native Python 3 code can generally simply avoid defining such APIs in the first place, developers porting libraries and frameworks from Python 2 generally have little choice, as they have to continue to support both styles of usage in order to allow their users to effectively port to Python 3.

However, we have done everything we can to make migrating to Python 3 the easiest exit strategy for Python 2, and provided a fairly leisurely time frame for the user community to make the transition. Full maintenance of Python 2.7 has now been extended to 2020, source only security releases may continue for some time after that, and, as noted above, I expect enterprise Linux vendors and other commercial Python redistributors to continue to provide paid support for some time after community support ends.

Essentially, the choices we have set up for Python 2 users that find Python 3 features that are technically backwards compatible with Python 2 attractive are:

  • Live without the features for the moment and continue to use Python 2.7
  • For standard library modules/features, use a backported version from PyPI (or create a backport if one doesn’t already exist and the module doesn’t rely specifically on Python 3 only language features)
  • Migrate to Python 3 themselves
  • Fork Python 2 to add the missing features for their own benefit
  • Migrate to a language other than Python

The first three of those approaches are all fully supported by python-dev. Many standard library additions in Python 3 started as modules on PyPI and thus remain available to Python 2 users. For other cases, such as unittest or configparser, the respective standard library maintainer also maintains a PyPI backport.

The fourth choice exists as the Tauthon project, so it will be interesting to see if that gains significant traction with developers and platform providers.

The final choice would be unfortunate, but we’ve done what we can to make the other alternatives (especially the first three) more attractive.

Doesn’t this make Python look like an immature and unstable platform?

Again, many of us in core development are aware of this concern, and have been taking active steps to ensure that even the most risk averse enterprise users can feel comfortable in adopting Python for their development stack, despite the current transition.

Obviously, much of the content in the answers above regarding the viability of Python 2 as a development platform, with a clear future migration path to Python 3, is aimed at enterprise users. Government agencies and large companies are the environments where risk management tends to come to the fore, as the organisation has something to lose. The start up and open source folks are far more likely to complain that the pace of Python core development is too slow.

The main change to improve the perceived stability of Python 3 is that we’ve started making greater use of the idea of “documented deprecation”. This is exactly what it says: a pointer in the documentation to say that a particular interface has been replaced by an alternative we consider superior that should be used in preference for new code. We have no plans to remove any of these APIs from Python - they work, there’s nothing fundamentally wrong with them, there is just an updated alternative that was deemed appropriate for inclusion in the standard library.

Programmatic deprecation is now reserved for cases where an API or feature is considered so fundamentally flawed that using it is very likely to cause bugs in user code. An example of this is the deeply flawed contextlib.nested API which encouraged a programming style that would fail to correctly close resources on failure. For Python 3.3, it was finally replaced with a superior incremental contextlib.ExitStack API which supports similar functionality without being anywhere near as error prone.

Secondly, code level deprecation warnings are now silenced by default. The expectation is that test frameworks and test suites will enable them (so developers can fix them), while they won’t be readily visible to end users of applications that happen to be written in Python. (This change can actually cause problems with ad hoc user scripts breaking when upgrading to a newer version of Python, but the longevity of Python 2.7 actually works in our favour on that front)

Finally, and somewhat paradoxically, the introduction of provisional APIs in Python 3 is a feature largely for the benefit of enterprise users. This is a documentation marker that allows us to flag particular APIs as potentially unstable. It grants us a full release cycle (or more) to ensure that an API design doesn’t contain any nasty usability traps before declaring it ready for use in environments that require rock solid backwards compatibility guarantees.

Why wasn’t I consulted?

Technically, even the core developers weren’t consulted: Python 3 happened because the creator of the language, Guido van Rossum, wanted it to happen, and Google paid for him to devote half of his working hours to leading the development effort.

In practice, Guido consults extensively with the other core developers, and if he can’t persuade even us that something is a good idea, he’s likely to back down. In the case of Python 3, though, it is our collective opinion that the problems with Unicode in Python 2 are substantial enough to justify a backwards compatibility break in order to address them, and that continuing to maintain both versions in parallel indefinitely would not be a good use of limited development resources.

We as a group also continue to consult extensively with the authors of other Python implementations, authors of key third party frameworks, libraries and applications, our own colleagues and other associates, employees of key vendors, Python trainers, attendees at Python conferences, and, well, just about anyone that cares enough to sign up to the python-dev or python-ideas mailing lists or add their Python-related blog to the Planet Python feed, or simply discuss Python on the internet such that the feedback eventually makes it way back to a place where we see it.

Some notable changes within the Python 3 series, specifically PEP 3333 (which updated the Web Server Gateway Interface to cope with the Python 3 text model) and PEP 414 (which restored support for explicit Unicode literals) have been driven primarily by the expressed needs of the web development community in order to make Python 3 better meet their needs.

The restoration of binary interpolation support in Python 3.5 is similarly intended to increase the size of the common subset of Python 2 and Python 3 in a way that makes it easier for developers to migrate to the new version of the language (as well as being a useful new feature for Python 3 in its own right).

If you want to keep track of Python’s development and get some idea of what’s coming down the pipe in the future, it’s all available on the internet.

But <name> says Python 3 was a waste of time/didn’t help/made things worse!

One previously popular approach to saying why Python 2 should be used over Python 3 even for new projects was to appeal to the authority of someone like Armin Ronacher (creator of Jinja2, Flask, Click, etc) or Greg Wilson (creator of Software Carpentry).

The piece missing from that puzzle is the fact that Guido van Rossum, the creator of Python, and every core developer of CPython, have not only been persuaded that the disruption posed by the Python 3 transition is worth the effort, but have been busily adding the features we notice missing from both Python 2 and 3 solely to the Python 3 series since the feature freeze for Python 2.7 back in 2010.

Where’s the disconnect? Well, it arises in a couple of ways. Firstly, when creating Python 3, we deliberately made it worse than Python 2 in particular areas. That sounds like a ridiculous thing for a language design team to do, but programming language design is a matter of making trade-offs and if you try to optimise for everything at once, you’ll end up with an unreadable mess that isn’t optimised for anything. In many of those cases, we were trading problems we considered unfixable for ones that could at least be solved in theory, even if they haven’t been solved yet.

In Armin’s case, the disconnect was that his primary interest is in writing server components for POSIX systems, and cross-platform command clients for those applications. This runs into issues, because Python 3’s operating system integration could get confused in a few situations:

  • on POSIX systems (other than Mac OS X), in the default C locale
  • on POSIX systems (other than Mac OS X), when ssh environment forwarding configures a server session with the client locale and the client and server have differing locale settings
  • at the Windows command line

This change is due to the fact that where Python 2 decodes from 8-bit data to Unicode text lazily at operating system boundaries, Python 3 does so eagerly. This change was made to better accommodate Windows systems (where the 8-bit APIs use the mbcs codec, rendering them effectively useless), but came at the cost of being more reliant on receiving correct encoding and decoding advice from the operating system. Operating systems are normally pretty good about providing that info, but they fail hard in the above scenarios.

In almost purely English environments, none of this causes any problems, just as the Unicode handling defects in Python 2 tend not to cause problems in such environments. In the presence of non-English text however, we had to decide between cross-platform consistency (i.e. assuming UTF-8 everywhere), and attempting to integrate correctly with the encoding assumptions of other applications on the same system. We opted for the latter approach, primarily due to the dominance of ASCII incompatible encodings in East Asian countries (ShiftJIS, ISO-2022, GB-18030, various CJK codecs, etc). For ordinary user space applications, including the IPython Notebook, this already works fine. For other code, we’re now working through the process of assuming UTF-8 as the default binary encoding when the operating system presents us with dubious encoding recommendations (that will be a far more viable assumption in 2016 than it was in 2008).

For anyone that would like to use Python 3, but is concerned by Armin Ronacher’s comments, the best advice I can offer is to use his libraries to avoid those problems. Seriously, the guy’s brilliant - you’re unlikely to go seriously wrong in deciding to use his stuff when it applies to your problems. It offers a fine developer experience, regardless of which version of Python you’re using. His complaints are about the fact that writing those libraries became more difficult in Python 3 in some respects, but he gained the insight needed to comprehensively document those concerns the hard way: by porting his code. His feedback on the topic was cogent and constructive enough that it was cited as one of the reasons he received a Python Software Foundation Community Service Award in October 2014.

The complaints from the Software Carpentry folks (specifically Greg Wilson) were different. Those were more about the fact that we hadn’t done a very good job of explaining the problems that the Python 3 transition was designed to fix. This is an example of something Greg himself calls “the curse of knowledge”: experts don’t necessarily know what other people don’t know. In our case, we thought we were fixing bugs that tripped up everyone. In reality, what we were doing was fixing things that we thought were still too hard, even with years (or decades in some cases) of Python experience. We’d waste memory creating lists that we then just iterated over and threw away, we’d get our Unicode handling wrong so our applications broke on Windows narrow builds (or just plain broke the first time they encountered a non-ASCII character or text in multiple encodings), we’d lose rare exception details because we had a latent defect in an error handler. We baked fixes for all of those problems (and more) directly into the design of Python 3, and then became confused when other Python users tried to tell us Python 2 wasn’t broken and they didn’t see what Python 3 had to offer them. So we’re now in a position where we’re having to unpack years (or decades) of experience with Python 2 to explain why we decided to put that into long term maintenance mode and switch our feature development efforts to Python 3 instead.

After hearing Greg speak on this, I’m actually really excited when I hear Greg say that Python 3 is no harder to learn than Python 2 for English speakers, as we took some of the more advanced concepts from Python 2 and made them no longer optional when designing Python 3. The Python 3 “Hello World!” now introduces users to string literals, builtins, function calls and expression statements, rather than just to string literals and a single dedicated print statement. Iterators arrive much earlier in the curriculum than they used to, as does Unicode. The chained exceptions that are essential for improving the experience of debugging obscure production failures can present some readability challenges for new users. If we’ve managed to front load all of that hard earned experience into the base design of the language and the end result is “just as easy to learn as Python 2”, then I’m happy with that. It means we were wrong when we thought we were making those changes for the benefit of beginners - it turns out English speaking beginners aren’t at a point where the issues we addressed are even on their radar as possible problems. But Greg’s feedback now suggests to me that we have actually succeeded in removing some of the barriers between competence and mastery, without harming the beginner experience. There are also other changes in Python 3, like the removal of the “__init__.py” requirement for package directories, the improvements to error messages when functions are called incorrectly, the inclusion of additional standard library modules like statistics, asyncio and ipaddress, the bundling of pip, and more automated configuration of Windows systems in the installer that should genuinely improve the learning experience for new users.

Greg’s also correct that any renaming of existing standard library functionality should be driven by objective user studies - we learned that the hard way by discovering that the name changes and rearrangements we did in the Python 3 transition based on our own intuition were largely an annoying waste of time that modules like six and future now have to help folks moving from Python 2 to Python 3 handle. However, we’re not exactly drowned in offers to do that research, so unless someone can figure out how to get it funded and executed, it isn’t going to happen any time soon. As soon as someone does figure that out, though, I look forward to seeing Python Enhancement Proposals backed specifically by research done to make the case for particular name changes, including assessments of the additional cognitive load imposed by students having to learn both the new names suggested by the usability research and the old names that will still have to be kept around for backwards compatibility reasons. In the meantime, we’ll continue with the much lower cost “use expert intuition and arguing on the internet to name new things, leave the names of existing things alone” approach. That low cost option almost certainly doesn’t find optimal names for features, but it does tend to find names that are good enough.

The other piece that we’re really missing is feedback from folks teaching Python to users in languages other than English. Much of the design of Python 3 is aimed at working better with East Asian and African languages where there are no suitable 8-bit encodings - you really need the full power of Unicode to handle them correctly. With suitable library support, Python 2 can be made to handle those languages at the application level, but Python 3 aims to handle them at the language and interpreter level - Python shouldn’t fail just because a user is attempting to run it from their home directory and their name can’t be represented using the latin-1 alphabet (or koi8-r, or some other 8-bit encoding). Similarly, naming a module in your native language shouldn’t mean that Python can’t import it, but in Python 2, module names (like all identifiers) are limited to the ASCII character set. Python 3 lifts the limitations on non-ASCII module names and identifiers in general, meaning that imposing such restrictions enters the domain of project-specific conventions that can be enforced with tools like pylint, rather than being an inherent limitation of the language itself.

But, but, surely fixing the GIL is more important than fixing Unicode…

With Eric Snow’s publication of his intent to investigate enhancing CPython’s existing subinterpreter model to provide native support for Communicating Sequential Processes based parallel execution, the discussion of Python’s multicore processing support that previously appeared here has been moved out to its own article.

Well, why not just add JIT compilation, then?

Note

This answer was written for Python 3.5. While CPython 3.6 still doesn’t ship with a JIT compiler by default, it does ship with a dynamic frame evaluation hook that allows third party method JITs like Pyjion to be enabled at runtime.

This is another one of those changes which is significantly easier said than done - the problem is with the “just”, not the “add JIT compilation”. Armin Rigo (one of the smartest people I’ve had the pleasure of meeting) tried to provide one as an extension module (the psyco project) but eventually grew frustrated with working within CPython’s limitations and even the limitations of existing compiler technology, so he went off and invented an entirely new way of building language interpreters instead - that’s what the PyPy project is, a way of writing language interpreters that also gives you a tracing JIT compiler, almost for free.

However, while PyPy is an amazing platform for running Python applications, the extension module compatibility problems introduced by using a different reference counting mechanism mean it isn’t yet quite as good as CPython as an orchestration system, so those users in situations where their Python code isn’t the performance bottleneck stick with the simpler platform. That currently includes scientists, Linux vendors, Apple, cloud providers and so on and so forth. As noted above when discussing the possible future of concurrency in Python, it seems entirely plausible to me that PyPy will eventually become the default application runtime for Python software, with CPython being used primarily as a tool for handling orchestration tasks and embedding in other applications, and only being used to run full applications if PyPy isn’t available for some reason. That’s going to take a while though, as vendors are currently still wary of offering commercial support for PyPy, not through lack of technical merit, but simply because it represents an entirely new way of creating software and they’re not sure if they trust it yet (they’ll likely get over those reservations eventually, but it’s going to take time - as the CPython core development team have good reason to know, adoption of new platforms is a slow, complex business, especially when many users of the existing platform don’t experience the problem that the alternative version is aiming to solve).

While PyPy is a successful example of creating a new Python implementation with JIT compilation support (Jython and IronPython benefit from the JIT compilation support in the JVM and CLR respectively), the Unladen Swallow project came about when some engineers at Google made a second attempt at adding a JIT compiler directly to the CPython code base.

The Unladen Swallow team did have a couple of successes: they made several improvements to LLVM to make it more usable as a JIT compiler, and they put together an excellent set of Python macro benchmarks that are used by both PyPy and CPython for relative performance comparisons to this day. However, even though Guido gave in principle approval for the idea, one thing they didn’t succeed at doing is adding implicit JIT compilation support directly to CPython.

The most recent attempt at adding JIT compilation to CPython is a project called Numba, and similar to psyco, Numba doesn’t attempt to provide implicit JIT compilation of arbitrary Python code. Instead, you have to decorate the methods you would like accelerated. The advantage of this is that it means that Numba doesn’t need to cope with the full dynamism of Python the way PyPy does - instead, it can tweak the semantics within the decorated functions to reduce the dynamic nature of the language a bit, allowing for simpler optimisation.

Anyone that is genuinely interested in getting implicit JIT support into the default CPython implementation would do well to look into resurrecting the speed.python.org project. Modelled after the speed.pypy.org project (and using the same software), this project has foundered for lack of interested volunteers and leadership. It comes back to the problem noted above - if you’re using Python for orchestration, the Python code becoming a bottleneck is usually taken as indicating an architectural issue rather than the Python runtime being too slow.

The availability of PyPy limits the appeal of working on adding JIT compilation to CPython as a volunteer or sponsoring it as a commercial user even further - if all of the extensions an application needs are also available on PyPy, then it’s possible to just use that instead, and if they aren’t available, then porting them or creating alternatives with cffi or a pure Python implementation is likely to be seen as a more interesting and cost effective solution than attempting to add JIT compilation support to CPython.

I actually find it quite interesting - the same psychological and commercial factors that work against creating Python 2.8 and towards increasing adoption of Python 3 also work against adding JIT compilation support to CPython and towards increasing adoption of PyPy for application style workloads.

What about <insert other shiny new feature here>?

The suggestions that adding a new carrot like free threading or a JIT compiler to Python 3 would suddenly encourage users that are happy with Python 2 to migrate generally misunderstand the perspective of conservative users.

Early adopters are readily attracted by shiny new features - that’s what makes them early adopters. And we’re very grateful to the early adopters of Python 3 - without their interest and feedback, there’s no way the new version of the language would have matured as it has over the last several years.

However, the kinds of things that attract conservative users are very different - they’re not as attracted by shiny new features as they are by reliability and support. For these users, the question isn’t necessarily “Why would I start using Python 3?”, it is more likely to be “Why would I stop using Python 2?”.

The efforts of the first several years of Python 3 deployment were about positioning it to start crossing that gap between early adopters and more conservative users. Around 2014, those pieces started falling into place, especially as more enterprise Linux vendors brought supported Python 3 offerings to market.

This means that while conservative users that are already using Python are likely to stick with Python 2 for the time being (“if it isn’t broken for us, why change it?”), new conservative users will see a fully supported environment, and 3 is a higher number than 2, even if the ecosystem still has quite a bit of catching up to do (conservative users aren’t going to be downloading much directly from PyPI either - they often prefer to outsource that kind of filtering to software vendors rather than doing it themselves).

Comments powered by Disqus