Tue, Dec 10, 2019
Welcome back to the fifth yearly edition of our Top Python Libraries list. Here you will find some hidden gems of the open-source world to get you started on your new project or spice up your existing ones. You'll find machine learning and non-machine learning libraries, so we got you all covered.
We hope you enjoy it as much as we did creating it, so here we go!
As a die-hard Python fan who usually interacts with APIs, you are probably familiar with the requests library. However, requests will do no good for you if you are using the async paradigm, which is increasingly common in high performance modern applications.
To solve this, the awesome Tom Christie and collaborators bring us HTTPX, a next-generation async HTTP client for the new decade.
Built following the same usability of requests, HTTPX gives you the standard features of it as well as HTTP/2 and HTTP/1 support. Other features include calling directly into a Python web application using ASGI protocol and being fully type annotated.
Do you need to make a large number of requests concurrently? Then HTTPX is the new go-to answer.
Note: HTTPX is still considered in alpha and currently is only being developed as an async client. In the future, the sync client will be reintroduced.
Starlette is a lightweight ASGI framework / toolkit with a bunch of features including WebSocket and GraphQL support, in-process background tasks and really high-performance. All of these come with a 100% type annotated codebase and zero hard dependencies. Think of it as a very lightweight, modern and async version of Flask.
It also gives you the flexibility to choose whether to use it as a complete web framework or just an ASGI toolkit.
It runs on top of an ASGI server such as uvicorn, which made it to this same list last year.
If you are thinking of developing a new web application you should definitely give Starlette a chance to shine.
Starlette is awesome, but it's very minimalistic and non-opinionated. This gives you a lot of freedom, but sometimes, you just need a framework to get things done right and fast.
FastAPI by Sebastián Ramírez is just that. It is fast in every sense of the word.
The new framework for building APIs with Python features very high performance and automatic interactive documentation based on the OpenAPI standards. It has default support for Swagger UI and ReDoc, which allows you to call and test your API directly from a browser speeding up development time. Building APIs with this framework is fast and easy.
This library also takes advantage of one of the modern Python best practices: type hints. FastAPI uses type hints for many things, but one of the coolest features is automatic data validation and conversion, powered by Pydantic.
Building on top of Starlette, performance of FastAPI is on par with NodeJS and Go, and it also has native WebSocket and GraphQL support.
Last, but not least, it has some of the best technical documentation ever written for an open source library. Seriously, check it out!
The folks at MagicStack are back, with a simple yet elegant immutable mapping type ("frozen dict").
Who could benefit from this? Well, the underlying data structure is a hash array mapped trie (HAMT), used in functional programming languages such as Haskell. The most interesting part is that they give O(log N) performance for both set() and get() operations, which is essentially O(1) for relatively small mappings.
If your application makes use of larger dictionaries and could use a bump in performance, this cool new library may be worth checking.
Pyodide is one of these projects that can truly blow your mind. It brings the Python scientific stack to the browser using WebAssembly, taking scientific computation to a whole new level.
Want to crunch some numbers with NumPy? Process some larger DataFrames with Pandas? Plot your results using Matplotlib? All of these and even more are now possible from the comfort of your browser, thanks to Pyodide.
What's even better: the packages directory lists over 35 which are currently available. Truly, the sky is the only limit.
Modin's motto is to Scale your Pandas workflow by changing a single line of code, and it really is that simple. Just install Modin, change your import statements and reap the benefits of up to 4x speed up on modern laptops with multi-core processors.
How does it do it? We'll let you in on the secret. Modin implements its own
modin.pandas.DataFrame object, which is a light-weight parallel DataFrame. Using this object is transparent because it is API-compatible with Pandas, and in the background, it will distribute data and computation using a computation engine such as Ray or Dask.
Sometimes, getting large speedups only requires minor changes to your code, and Modin is a testament to that.
In every non-trivial machine learning project there comes a time at which you will end up needing to manually interact with the model and your data.
Instead of spending hours of effort and thousands of lines of code to develop an application, Streamlit allows you to quickly build apps to share your model and analyses. Creating an UI to interact with and visualize your data and the outputs of your model is now as easy as pie.
Streamlit provides a fast way to jump from your Python scripts to a production-level app just by adding some lines to your code. TensorFlow, Keras, PyTorch, Pandas- you name it, Streamlit works with every data science related tool.
If you are doing any machine learning related work, you probably have heard about the important advancements around natural language processing (NLP) ocurring in the past year.
Many new and high performing models such as BERT, XLNet or roBERTa have been developed, significantly advancing the state of the art across a wide variety of NLP tasks (such as text classification, machine translation, named entity recognition, and many more).
For practitioners, it is important to have tools that can power production applications using these models, that are not too complex to use. For researchers, it is important to have libraries in which the internals can be tweaked, where new models can be developed and experimented with without wasting too much time writing boilerplate code.
The amazing folks at Hugging Face bring us transformers, a library which includes packed-up, pre-trained and ready to use implementations of the most modern NLP models. Interoperability between TensorFlow 2.0 and PyTorch helped catapult this library to an industry standard, powering both research and production applications. They also move very fast, frequently introducing new models in the library as they are developed by researchers.
Cherry on top of the cake: the Hugging Face team developed DistilBERT, a distilled version of BERT that is smaller, faster, cheaper and lighter.
Are you still on the fence about making the switch to modern NLP using Hugging Face / Transformer? Today's your lucky day, you can check their great online demo and marvel at its powers.
Facebook's AI research team (FAIR) has been pushing the limits of computer vision (CV) through developments of new models for tasks like object detection, pose estimation, semantic / instance segmentation, and lately, panoptic segmentation.
The possibility of solving many of these problems seemed like science fiction only a couple of years ago. We've come to expect nothing but the very best from FAIR, and this time they manage to shake the scene once again.
Detectron2 is the much-awaited sequel to Detectron, built from the ground up with PyTorch and packed with state-of-the-art computer vision algorithms.
Libraries such as these are particularly difficult to engineer because of the diverse type of use cases they must support. Like the case of Hugging Face's Transformers, the FAIR team did a great job by designing Detectron2 in a very malleable and modular fashion, which makes it very appealing for CV research applications. At the same time, it is extremely simple to use, making it ideal for people who just want to get quick results and not mess with internals. Yes, you can use Detectron2 and have your software able to "understand" images with only a few lines of Python code.
Time will tell if Detectron2 succeeds at generating a vibrant community, but things are looking pretty promising so far. It may very well become the "go-to" solution for CV applications, where new — faster and better — models are contributed as they are created by researchers. If you are doing any sort of CV work, keep this under your radar!
This is literally the new kid on the block, so new it barely made it to this 2019 list! But don't be fooled: although it was released less than 2 weeks ago it has been already battle tested internally by Netflix, until they decided to open source after 2 years of refinement.
Metaflow is a Python library to help data scientists and engineers build real-life projects for use in the real world. The main focus is to alleviate the technical burden for non-technical data scientists such as compute resources, parallel execution, architecture design and versioning to name a few. Netflix partnered up with AWS to allow you to easily define complex data flows with out of the box support for distributed computing.
We are already evaluating Metaflow for some key projects inside Tryolabs. If you are interested in knowing more about this tool check Netflix's release blog post.
Another year (decade?!) has passed, leaving behind meaningful contributions to the open-source world that will be relevant for years to come. You can check the libraries' evolution in our previous editions: 2015, 2016, 2017, 2018.
We would like to take a few lines to thank everyone in the community for their valuable contributions and you, the reader, for making it this far on our blog post.
Oh, and BTW if we’ve left out your favorite Python library, please feel free to comment below. We'd love to hear what you have to say.