Controversy End-to-End: L4 Autonomy or Marketing Extravaganza?

Marked by Tesla's release of the V12 version of its Full Self-Driving (FSD) intelligent driving system, autonomous driving has seemingly entered the end-to-end era overnight.

"The lower limit capabilities of end-to-end models are expected to rapidly improve next year, and once enhanced, it won't take more than two years to achieve capabilities that surpass the L4 standard globally," said He Xiaopeng, chairman of Xpeng Motors, at the 2024 Hangzhou Yunqi Conference.

After adopting the end-to-end large model, Tesla's FSD is completely different from before, and it could potentially outperform human veteran drivers by next year.

Xpeng Motors is one of the first domestic car manufacturers to follow Tesla's lead, and by the end of July this year, it began rolling out its XNGP intelligent driving system based on the end-to-end large model to users.

By September of this year, companies like Huawei and Li Auto had also started pushing their corresponding intelligent driving systems based on end-to-end large models; NIO, on the other hand, applied the end-to-end large model to its AEB system and released its self-developed world model.

With the adoption of end-to-end large models, car manufacturers have become more aggressive in promoting intelligent driving.

The once bustling concepts of smart driving cities and high-precision maps are no longer the hot topics; instead, the introduction of door-to-door and point-to-point driving assistance systems has been officially moved up the agenda.

Xpeng Motors even claims that it can achieve L3+ level autonomous driving user experience with the hardware cost of L2 level intelligent driving.

For a while, it seems that intelligent driving systems without end-to-end capabilities are already associated with being outdated.

"All intelligent driving without large models will be eliminated," He Xiaopeng also said that all L4 autonomous driving companies should switch to large models as soon as possible.

Chen Tao Capital, in conjunction with three parties, released the "End-to-End Autonomous Driving Industry Research Report" (hereinafter referred to as the "Report"), which shows that among the more than 30 first-line experts in the autonomous driving industry it interviewed, 90% said that the companies they work for have invested in the development of end-to-end technology, and most technology companies believe that they cannot afford to miss the consequences of this technological revolution.

However, not all "players" agree that the end-to-end large model is a disruptor in the current intelligent driving system landscape.

Hou Cong, CTO of Qingzhou Smart Navigation, told reporters from First Financial Daily that he experienced Tesla's FSD V12.3 system in the United States, which, although significantly improved from Tesla's previous FSD, still has a noticeable gap compared to Waymo Robotaxi, which aims for L4.

Former founder of TuSimple, Hou Xiaodi, called for the industry to view this rationally and not to mythologize end-to-end.

Amid the controversy over this technology, car company executives like Musk and He Xiaopeng strongly support end-to-end; while executives from L4 intelligent driving companies such as Hou Cong, Hou Xiaodi, and Lou Tiancheng (CTO of Pony.ai) believe that the end-to-end large model cannot directly upgrade L2 intelligent driving assistance to L4 autonomous driving in terms of technology.

The "Report" also shows that due to the current technology still being in its early stages, there are still many application difficulties and pain points to be solved with the end-to-end large model, such as significant divergence in technical routes, large data and computing power requirements, immature testing and verification methods, and huge resource investments.

On the road to the final destination of autonomous driving, the end-to-end large model has also become another technical route controversy after pure vision perception and radar fusion perception.

Is Tesla leading the technological revolution again?

Starting with technologies such as integrated casting and battery body integration, Tesla has become the industry benchmark for new energy vehicle technology.

Many Chinese car companies are considered to be "crossing the river by feeling the stones of Tesla," and with the end-to-end large model on board, Tesla has once again led the transformation of new energy vehicles.

Before the end-to-end large model was on board, intelligent driving assistance systems were divided into perception, planning, decision-making, control, and other modules, among which artificial intelligence and machine learning were mostly applied in perception, planning, etc., but the modules were mainly defined by artificially handwritten rules, known as "rule-based" (based on rules).

However, in actual system operation, vehicles often encounter endless corner cases (long-tail problems), and to solve these problems, engineers need to write code and establish rules based on specific scenarios.

Under this model, intelligent driving assistance or autonomous driving systems often require a large number of rules to be input manually by humans.

Wu Xinzhou, NVIDIA's Global Vice President and Head of the Automotive Business Unit, believes that the existing algorithms for autonomous driving are mostly rule-based, which sounds simple, from seeing what to how to do, but it is very difficult to establish good rules, requiring many human engineers to think of all possibilities as much as possible, and this method has an upper limit.

Unlike traditional rule-based intelligent driving assistance systems, end-to-end autonomous driving solutions mean that the entire process from perception to regulation and control is processed through advanced algorithms and deep learning technology.

The application of end-to-end technology in autonomous driving has transformed the original architecture of multiple models combined for perception, prediction, planning, etc., into a single model architecture of "perception decision integration."

A research report released by Cinda Securities shows that "end-to-end" refers to one end inputting image and other environmental data information, going through a multi-layer neural network model similar to a "black box" in the middle, and directly outputting driving commands such as steering, braking, and acceleration at the other end.

Compared with the traditional modular architecture driven by rules, the implementation of end-to-end will bring a series of advantages: fully data-driven for global task optimization, with better and faster error correction capabilities; it can further reduce the lossy transmission, delay, and redundancy of information between modules, avoid error accumulation, and improve computational efficiency; stronger generalization ability, shifting from rule-based (based on rules) to learning-based (based on learning), with zero-sample learning ability, and stronger decision-making ability in the face of unknown scenarios.

With the support of the end-to-end large model, the intelligent driving system can achieve faster iteration and progress.

Taking Xpeng's XNGP as an example, after applying the end-to-end large model, its three-in-one neural network XNet+regulation and control large model XPlanner+AI large language model XBrain can iterate every two days, and the intelligent driving ability can be increased by 30 times in 18 months; the data system capability and neural network architecture can achieve rapid diagnosis, solving long-tail problems in hours.

With Tesla's end-to-end large model on board, in 2024, the intelligent driving technology route of Chinese car companies also began to change significantly.

In the past few years, the technical route controversy of Chinese car companies' intelligent driving assistance systems has mostly focused on vision perception and fusion perception, and the competition at the terminal is more about the speed and quantity of opening cities.

At the beginning of 2024, companies like Huawei and Xpeng are still competing for no high-precision map and truly "can be opened nationwide."

After the end-to-end large model was on board, the generalization ability of the intelligent driving assistance system was greatly improved, and the importance of verification and opening cities for a single region decreased.

At the same time, the end-to-end model weakened the previous distinction between perception, planning, decision-making, and control modules, and many car companies also began to adjust the organizational structure of the autonomous driving team based on the needs of the end-to-end large model.

At the end of 2023, Li Auto made an organizational adjustment to the intelligent driving team, in which Li Auto reorganized the large model into a team, under the front-end algorithm R&D team, responsible for the development and on-board of the end-to-end architecture; in 2024, NIO established the large model department, deployment architecture and solution department, and spatiotemporal information department, and abolished the original perception department, planning and control department, environmental information department, and solution delivery department.

Although the end-to-end model is in full swing, most Chinese car companies have not yet achieved the theoretical "One-Mode" end-to-end intelligent driving.

A CTO of an autonomous driving company told reporters that the application of the end-to-end model in intelligent driving can be divided into two stages: the first stage is the two-model solution, composed of an end-to-end perception and an end-to-end regulation and control, which is currently a more mainstream direction used in the industry; the second stage is the one-model solution, a large model solves the information input to decision output, which is closer to the direction of AGI, but this direction is more difficult, and it is estimated that it will take 3~5 years to get some large-scale applications.

The industry generally believes that the development progress of domestic car companies is about 1.5~2 years behind Tesla.

Gu Junli, deputy general manager of Chery Automobile Co., Ltd., believes that to catch up with Tesla in the business model, it is necessary to form a scale of products.

"When the data reaches the level of millions like Tesla, through the reinforcement training of the model, intelligent driving can learn the video stream and directly tell the driver the direction of driving, just like the popular ChatGPT now."

Gu Junli said.

Do car manufacturers and suppliers have different routes?

When many car companies have successively launched end-to-end large models and are promoting the advent of the autonomous driving era, many suppliers who focus on autonomous driving have issued different voices.

"After Tesla launched the end-to-end FSD, some problems occurred, and there were situations where the car rushed onto the shoulder, especially at night, sometimes there were scratches, sometimes it directly rushed onto the shoulder and hit the tire flat."

Hou Cong told reporters that in the United States, Waymo did not use the end-to-end large model, but it has been able to achieve unmanned Robotaxi operations in many cities, and the user feedback is also quite good.

The end-to-end large model itself is not a new technology that has achieved a breakthrough in recent years.

"Before the emergence of deep learning around 2010, it was called model analysis algorithms.

At that time, we did pedestrian detection at Tsinghua University, and we needed to extract some feature information from the image, such as the curvature of the shoulders, the color of the eyes, etc., these features were summarized by us, which is rule-based; after deep learning came out, we input the image and let deep learning learn autonomously, and finally, the different features of each person were learned by deep learning, not defined by human effort.

This is the same as today's end-to-end, which is based on learning-based."

Hou Cong told reporters that this system, like the current end-to-end intelligent driving assistance, requires massive data support.This is also considered one of the important factors for car manufacturers to compete for end-to-end large models.

Compared with L4 autonomous driving suppliers who only operate a fleet of a hundred test vehicles, car manufacturers usually have hundreds of thousands or even millions of products driving on the road.

The vast amount of data generated during user driving helps car manufacturers to train their own end-to-end intelligent driving systems, helping the system to achieve rapid iteration.

In addition, Dong Jun, an engineer from an L2+ intelligent driving assistance system supplier, told reporters that for suppliers, end-to-end intelligent driving is difficult to become a standardized product; changes in vehicle body form, changes in sensor installation positions, etc., require the entire system to retrain the model, which requires a lot of cost and time, and the efficiency is not good.

The significance of end-to-end large models for L2 driving assistance is to accelerate the speed of opening cities and speed up the realization of "driving everywhere in the country" as claimed by car manufacturers.

However, for L4-level autonomous driving companies, end-to-end large models can also reduce the system's dependence on high-precision maps in the initial stage of operation, allowing the company to expand its operating range in a shorter time; but in the middle and later stages of operation, high-precision maps still have an important impact, which can further improve the reliability, safety, and smoothness of the autonomous driving system.

On the other hand, compared with car manufacturers such as Tesla and Li Auto that have achieved profitability, most autonomous driving companies currently rely on financing.

And the large-scale model of end-to-end needs not only a large amount of data but also a lot of capital investment.

"In the future, when intelligent driving enters the L4 stage, the data and computing power will grow exponentially every year, which means at least 1 billion US dollars per year, and continuous iteration is needed after 5 years.

At such a scale, it is very difficult for a company's profitability and profit to support the investment.

So, now it is not necessary to pay attention to how much billions of dollars are invested in autonomous driving, but to start from the essence, whether there is sufficient computing power and data support, and then see how much money needs to be invested."

Li Auto's Vice President of Intelligent Driving Research and Development, Lang Xianpeng, said to reporters.

Xia Yiping, CEO of Jiyue Auto, believes that 20 billion yuan was once recognized as the threshold for car manufacturing, and now companies cannot do a good job in intelligent driving without 50 billion yuan.

More importantly, for autonomous driving companies like Waymo and Pony.ai that are determined to achieve L4 Robotaxi, their considerations for system weight, cost, and other aspects are very different from those of car manufacturers.

Unlike L2 driving assistance, L3 and above autonomous driving, the subject of accident responsibility will shift to the vehicle, which puts extremely high demands on the stability and safety of the autonomous driving system.

The unexplainable nature of the black box of end-to-end large models brings certain risks to the autonomous driving system.

"Car manufacturers continue to launch end-to-end large models of intelligent driving and promote them extensively, and the core is still to create differentiation, with the purpose of selling cars."

Dong Jun said.

Hou Xiaodi said in an interview with the media that if Tesla's FSD has an accident, then the responsibility is still the driver, Tesla requires the driver to keep his hands on the steering wheel throughout the process, and the accident is unrelated to Tesla; in addition, Tesla's business is selling cars, and FSD is an additional value for selling cars.

If you want to consider how to sell more cars, you can't be like L4 and focus on a certain area, solving all the corner cases (long-tail problems) in this area.

Hou Cong and other interviewees from autonomous driving companies proposed that L4 autonomous driving requires 100% safety and cannot accept the unexplainable and uncertainty brought by the end-to-end "black box".

In addition, there is a huge difference in business logic between L2 and L4.

For car manufacturers, selling cars is the main business, and cost determines profit and market competitiveness, so it is inevitable that there will not be too much safety redundancy in the product; while L4 Robotaxi focuses more on operation, and for a considerable period of time, it will be mainly a business-to-business business, and will not directly serve consumers, so related companies not only need to consider the car but also need to consider various situations in vehicle operation.

"For example, what to do if the car is stuck, what to do if the hardware is broken, what to do if an accident occurs, this requires more redundancy, and Tesla cannot reserve a lot of redundancy like Waymo, because their business logic is different."

Hou Cong said.

Can the world model achieve autonomous driving?

Although there are differences, many technical personnel of autonomous driving companies also agree in interviews that the large-scale model of end-to-end can improve the upper limit of the current car intelligent driving assistance system.

Many practitioners said that the large-scale model of end-to-end shows a "seesaw" state, and the large-scale model of end-to-end can improve the upper limit of the intelligent driving assistance system, but it will also reduce the lower limit of the system performance.

"The large-scale model of end-to-end is trained based on a probability model, and it has a problem that for relatively simple and easily described scenes, its output is not as precise, and the bottom line is relatively low; Tesla has done a good job in this area, but it has not completely solved this problem.

We believe that under the current conditions of insufficient data, it is still necessary to gradually achieve end-to-end, replace one module at a time, complete end-to-end while ensuring safety, and in this way, a relatively solid engineering infrastructure and rapid iteration can improve the system's performance upper limit step by step, and also ensure the lower limit of system performance."

Chen Liming, President of Horizon Robotics, said.

The large-scale model of end-to-end is data-driven, with sensor data at the input end and driving decisions at the output end, but there is a strong unexplainable nature in the middle, and people cannot know the process of the system's final decision-making, which is often compared to a black box.

Hou Cong believes that the current large-scale model of end-to-end intelligent driving is similar to the previous rule-based intelligent driving and the car production process, "In the past, car manufacturers bought parts from different companies to put together, on the one hand, it is convenient for procurement, and the suppliers are dispersed, and it is not easy to be 'strangled'; the second point is easy to repair, where it is broken, repair it.

The multi-module autonomous driving is the same, and the advantage is that it can better define and solve problems."

Taking the traditional multi-module autonomous driving as an example, if there is a problem in the system during testing, developers can find bugs in the corresponding plate according to the situation and repair them.

But for the black box like the large-scale model of end-to-end, developers can only train strategies, retrain, or modify the model, but cannot modify the parameters in the "black box".

And as the system is upgraded and iterated, the more difficult the problems solved by the system, the more cost is required, which sets a high threshold for the large-scale model of end-to-end.

On the other hand, the large-scale model of end-to-end is data-driven, but a large amount of data does not necessarily bring a positive improvement to the system.

Xiao Bo, the person in charge of the AI team of Pony.ai, believes that even if the algorithm is very good and the system training is also very good, the ability learned from a large amount of human driving data is about the level of an average human driver, which is enough to cope with L2-level intelligent driving assistance; but for L4 or above autonomous driving, the ability needs to be 10 times or more than a human driver, and this model is not enough to support.

Just when the large-scale model of end-to-end is rapidly popularizing, domestic car manufacturers and suppliers have once again proposed a new concept of "world model".

Lou Tiancheng believes that the world model is currently the best and most important thing, and it is understood as the only solution to autonomous driving.

The world model can be understood as a simulation and modeling of the real world, which can truly and accurately restore the changes of scenes such as intersections.

For example, the trajectory of pedestrians who are blocked when they sneak out; the reaction of pedestrians and other vehicles at the moment of vehicle collision; even reflecting the details that people can reach the acceleration of gravity when running.

At the same time, the world model is also a scoring system, which evaluates the performance of the autonomous driving system, and it can be known who is better between system A and system B.

Previously, car manufacturers such as NIO and Li Auto have successively released their "world models".

Ren Shaoqing, Vice President of NIO Autonomous Driving, said: "Compared with the conventional end-to-end model, the new world model has three main advantages that we believe.

The first is in spatial understanding, through generative models, from the way of reconstructing sensors, more generalized extraction of information.

The second is through autoregressive models, automatic modeling of long-time environmental sequences.

The third, the world needs more data, through self-supervised methods, without manual labeling, it is a multi-variate autoregressive generative model structure, which allows us to learn better."

Lou Tiancheng believes that the world model can be understood as a "coach" simulated by humans, for L2 systems, its driving ability is equivalent to that of an old driver; for L4 systems, its driving level is far higher than that of human drivers, and it is trained to drive the intelligent driving system, and the result is definitely better than that of human drivers.

Although there are still disputes, most of the interviewees still believe that in the L2 intelligent driving assistance stage, the large-scale model of end-to-end can indeed improve the performance upper limit of the related system.

Most of the practitioners of L4 autonomous driving companies do not agree that Tesla, Xiaopeng and other car manufacturers have widely publicized the end-to-end technology, and the products are based on L2 intelligent driving, and even achieve L4 autonomous driving capabilities on the hardware level of L2.

"At this stage, car manufacturers are widely publicizing end-to-end, shaping end-to-end into a cutting-edge technology leading to autonomous driving, and behind it, it is still more about selling more cars."

Dong Jun said.

Share: