Skip to main content

· 5 min read
Haidong Lan

In our recently published blog: Taichi AOT, the solution for deploying kernels in mobile devices [1], we demonstrated how to deploy a gravity-based, interactive physical simulation on an Android mobile phone. As we know, the computing capability of mobile devices is limited by hardware and production costs and is barely satisfactory. The question then arises: Is Taichi Lang able to make better use of the underlying hardware than other native, low-level programming languages? With this question in mind, we kick-started the benchmark project in an attempt to provide a comprehensive and accurate performance evaluation of Taichi Lang.

Taichi Lang is a domain-specific language (DSL) and can solve a great many numerical computing problems with just a few lines of code. We established a testing set of frequently used algorithms, from which we compared Taichi Lang with the top performer (CUDA) in the field on every benchmark. To put it differently, we compared Taichi Lang's implementation of these algorithms with other top-notch third-party implementations. The aim is to evaluate the effectiveness of the language's inbuilt optimization mechanism and to look for room for improvement. Further, the comparison between Taichi Lang and CUDA uses Nvidia GPUs to ensure that we evaluate Taichi Lang with the highly optimized CUDA code.

· 6 min read
Ye Kuang

Physical simulation, which Taichi Lang is best at, has wide applications on mobile devices, such as real-time physically-based interactions in mobile games or cool visual effects in short videos. This is thanks to Taichi's features such as fast prototyping and cross-platform GPU acceleration.

However, Taichi is currently a language embedded in the Python frontend. Python is not the most ideal language when it comes to deployment, because Python's heavy virtual machine design often makes it hard to embed Python in other host languages. Therefore, we've been constantly thinking about how to have Taichi's users enjoy both the rapid iteration of Python and seamless deployment in real industrial scenarios.

· 15 min read
Lin Jiang
Simple is better than complex.

In the previous blog post, we mentioned this sentence, which is a part of the zen of Python. In this post, we will show you how we simplified the code of Taichi. We have refactored the Abstract Syntax Tree (AST) transformer of taichi in the past few months. The error reporting experience has improved dramatically after the refactor. This blog mainly talks about why and how we did that.

· 4 min read
Ailing Zhang

In this blog post I'll briefly talk about the data containers in Taichi and Torch. As you might have already known, both Taichi and Torch have a core concept of multi-dimensional array containers, called taichi.field and torch.Tensor respectively. They, as well as numpy.arrays, share a lot in common so users might think they're exactly the same. Therefore, we want to share a few interesting differences in this blog so that new users don't get confused by similar names or usages.

· 5 min read
Ailing Zhang

"What is the advantage of Taichi over Pytorch or Tensorflow when running on GPU?" Not surprisingly, this is one of the top questions we've received in the Taichi user community. In this blog series we will walk you through some major concepts and components in Taichi and Torch, elaborating on the analogies and differences which might not be obvious at the first sight.

Let’s start with a simple fact. Except for some minor intersections, Taichi and Torch target almost completely different users and applications. Torch is always your first choice for deep learning tasks like computer vision, natural language processing and so on. Taichi, on the other hand, specializes in parallel high-performance numerical computational tasks and really comes in handy when it comes to physics simulation and visual computing tasks.

From a high-level perspective, Taichi looks very similar to Torch in the sense that their main goals are both to lower the bar for their users. Compared to the static computation graph-based Tensorflow 1.0, Torch eager mode changes the game by building the graph on the fly as your python program runs. Similarly, Taichi aims to enable more people to write high-performance parallel programs that used to require a lot of domain knowledge in CUDA, OpenGL, or Vulkan.

· 15 min read
Dunfan Lu

Ever since the Python programming language was born, its core philosophy has always been to maximize the readability and simplicity of code. In fact, the reach for readability and simplicity is so deep within Python's root, that if you type import this in a Python console, it will recite a little poem:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
...

Simple is better than complex. Readability counts. No doubt, Python has indeed been quite successful at achieving these goals: it is by far the most friendly language to learn, and an average Python program is often 5-10 times shorter than equivalent C++ code. Unfortunately, there is a catch: Python's simplicity comes at the cost of reduced performance. In fact, it is almost never surprising for a Python program to be 10-100 times slower than its C++ counterpart. It thus appears that there is a perpetual trade-off between speed and simplicity, and no programming language shall ever possess both.

But don't you worry, all hope is not lost.