Toggle light / dark theme

Amazing New Chinese A.I.-Powered Language Model Wu Dao 2.0 Unveiled

It’s ten times more powerful than the current U.S. effort.


Earlier this month, Chinese artificial intelligence (A.I.) researchers at the Beijing Academy of Artificial Intelligence (BAAI) unveiled Wu Dao 2.0, the world’s biggest natural language processing (NLP) model. And it’s a big deal.

NLP is a branch of A.I. research that aims to give computers the ability to understand text and spoken words and respond to them in much the same way human beings can.

Last year, the San Francisco–based nonprofit A.I. research laboratory OpenAI wowed the world when it released its GPT-3 (Generative Pre-trained Transformer 3) language model. GPT-3 is a 175 billion–parameter deep learning model trained on text datasets with hundreds of billions of words. A parameter is a calculation in a neural network that shapes the model’s data by assigning to each chunk a greater or lesser weighting, thus providing the neural network a learned perspective on the data.

ZeRO-Infinity and DeepSpeed: Unlocking unprecedented model scale for deep learning training

Since the DeepSpeed optimization library was introduced last year, it has rolled out numerous novel optimizations for training large AI models—improving scale, speed, cost, and usability. As large models have quickly evolved over the last year, so too has DeepSpeed. Whether enabling researchers to create the 17-billion-parameter Microsoft Turing Natural Language Generation (Turing-NLG) with state-of-the-art accuracy, achieving the fastest BERT training record, or supporting 10x larger model training using a single GPU, DeepSpeed continues to tackle challenges in AI at Scale with the latest advancements for large-scale model training. Now, the novel memory optimization technology ZeRO (Zero Redundancy Optimizer), included in DeepSpeed, is undergoing a further transformation of its own. The improved ZeRO-Infinity offers the system capability to go beyond the GPU memory wall and train models with tens of trillions of parameters, an order of magnitude bigger than state-of-the-art systems can support. It also offers a promising path toward training 100-trillion-parameter models.

ZeRO-Infinity at a glance: ZeRO-Infinity is a novel deep learning (DL) training technology for scaling model training, from a single GPU to massive supercomputers with thousands of GPUs. It powers unprecedented model sizes by leveraging the full memory capacity of a system, concurrently exploiting all heterogeneous memory (GPU, CPU, and Non-Volatile Memory express or NVMe for short). Learn more in our paper, “ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning.” The highlights of ZeRO-Infinity include:

The coming productivity boom

When you put these three factors together—the bounty of technological advances, the compressed restructuring timetable due to covid-19, and an economy finally running at full capacity—the ingredients are in place for a productivity boom. This will not only boost living standards directly, but also frees up resources for a more ambitious policy agenda.


AI and other digital technologies have been surprisingly slow to improve economic growth. But that could be about to change.

A Detective AI Can Identify Obscure People From Multiple Sources

Researchers at Oxford University have developed an AI-enabled system that can comprehensively identify people in videos by conducting detective-like, multi-domain investigations as to who they might be, from context, and from a variety of publicly available secondary sources, including the matching of audio sources with visual material from the internet.

Though the research centers on the identification of public figures, such as people appearing in television programs and films, the principle of inferring identity from context is theoretically applicable to anyone whose face, voice, or name appears in online sources.

Indeed, the paper’s own definition of fame is not limited to show business workers, with the researchers declaring ‘We denote people with many images of themselves online as famous‘.

U.S. Launches Task Force to Study Opening Government Data for AI Research

WASHINGTON—The Biden administration launched an initiative Thursday aiming to make more government data available to artificial intelligence researchers, part of a broader push to keep the U.S. on the cutting edge of the crucial new technology.

The National Artificial Intelligence Research Resource Task Force, a group of 12 members from academia, government, and industry led by officials from the White House Office of Science and Technology Policy and the National Science Foundation, will draft a strategy for creating an AI research resource that could, in part, give researchers secure access to stores of anonymous data about Americans, from demographics to health and driving habits.

They would also look to make available computing power to analyze the data, with the goal of allowing access to researchers across the country.

A California Startup Now Offers a Full EV Battery in Just 10 Minutes

(Bloomberg) — On a Wednesday afternoon in May, an Uber driver in San Francisco was about to run out of charge on his Nissan Leaf. Normally this would mean finding a place to plug in and wait for a half hour, at least. But this Leaf was different.

Instead of plugging in, the driver pulled into a swapping station near Mission Bay, where a set of robot arms lifted the car off of the ground, unloaded the depleted batteries and replaced them with a fully charged set. Twelve minutes later the Leaf pulled away with 32 kilowatt hours of energy, enough to drive about 130 miles, for a cost of $13.

A swap like this is a rare event in the U.S. The Leaf’s replaceable battery is made by Ample, one of the only companies offering a service that’s more popular in markets in Asia. In March, Ample announced that it had deployed five stations around the Bay Area. Nearly 100 Uber drivers are using them, the company says, making an average of 1.3 swaps per day. Ample’s operation is tiny compared to the 100000 public EV chargers in the U.S.—not to mention the 150000 gas stations running more than a million nozzles. Yet Ample’s founders Khaled Hassounah and John de Souza are convinced that it’s only a matter of time before the U.S. discovers that swapping is a necessary part of the transition to electric vehicles.

Heres What 6G Will Be, According to the Creator of Massive MIMO

COVID 19 pandemic, automation and 6G could end the metropolitan era from building high sky scrapers for companies. Companies can operate like a network from home to home without going to office. This will help a lot to bring down Urban Heat Islands and make our cities more efficient in transportation and communication to send the data even faster.

Tom Marzetta is the director of NYU Wireless, New York University’s research center for cutting-edge wireless technologies. Prior to joining NYU Wireless, Marzetta was at Nokia Bell Labs, where he developed massive MIMO. Massive MIMO (short for “multiple-input multiple-output”) allows engineers to pack dozens of small antennas into a single array. The high number of antennas means more signals can be sent and received at once, dramatically boosting a single cell tower’s efficiency.

Massive MIMO is becoming an integral part of 5G, as is an independent development that came out of NYU Wireless by the center’s founding director Ted Rappaport: Millimeter waves. And now the professors and students at NYU Wireless are already looking ahead to 6G and beyond.

Marzetta spoke with IEEE Spectrum about the work happening at NYU Wireless, as well as what we all might expect from 6G when it arrives in the next decade. The conversation below has been edited for clarity and length.

This Neural Networkfrom OpenAI can Learn from Small Datasets

Glow is an iconic interesting research about deep neural networks that can generalize with small training sets.


Since the early days of machine learning, artificial intelligence scenarios have faced with two big challenges in order to experience mainstream adoption. First, we have the data efficiency problem that requires machine or deep learning models to be trained using large and accurate datasets which, as we know, are really expensive to build and maintain. Secondly, we have the generalization problem which AI agents face in order to build new knowledge that is different from the training data. Humans, by contrast, are incredibly efficient learning with minimum supervision and rapidly generalizing knowledge from a few data examples.

Generative models are one of the deep learning disciplines that focuses on addressing the two challenges mentioned above. Conceptually, generative models are focused on observing an initial dataset, like a set of pictures, and try to learn how the data was generated. Using more mathematical terms, generative models try to infer all dependencies within very high-dimensional input data, usually specified in the form of a full joint probability distribution. Entire deep learning areas such as speech synthesis or semi-supervised learning are based on generative models. Recently, generative models such as generative adversarial networks(GANs) have become extremely popular within the deep learning community. Recently, OpenAI experimented with a not-very well-known technique called Flow-Based Generative Models in order to improve over existing methods.

/* */