all 14 comments

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (7 children)

How does DeepSeek R1 really fare against OpenAI’s best reasoning models?

R1 is a so-called reasoning model, i.e. a model specialized for reasoning. The division between "standard" and "reasoning" language models is not entirely accurate. If you force a language model to answer a question straight out "without sauce", you typically get a lower quality result than if you let it reason about the problem and think about it. But unlike a human, a language model has no way to reason about something other than by talking. If you force a model to answer quickly, it makes similar logical errors to a human thinking with the metaphorical instinctive System 1 (from Kahneman and Tversky's famous book Thinking Fast and Slow).

The embedding "reasoning" part into answers is an interesting functionality and I guess other large models will follow this trait too. I briefly tested DeepSeek models available at ollama.com locally and it seems for me, DeepSeek R1 isn't better than clones of language models already exposed here. Also it has a tendency to embed Chinese characters into a conversation and occasionally switch into Chinese completely. The publicly available models are indeed a heavily censored too. See also:

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (2 children)

DeepSeek's R1 curiously tells El Reg reader: 'My guidelines are set by OpenAI'

DeepSeek v3 was trained using data from the GPT-4 output, which seems to be "quite common in training many LLMs," notes Microsoft software engineer Dongbo Wang on the developer site GitHub. The main change from the competition is that it only trains the necessary parts of the model. As a result, only 5% of the model parameters were trained per token (part of speech). This should have led to a 95% reduction in GPU utilization compared to conventional training without any loss of accuracy.

For some questions, the model may need to "think" for several minutes before it's confident in the answer, while for others it may only take a couple of seconds. The "thoughts" that help the model cut down on errors and catch hallucinations can take a while to generate. It's additional stages of intermediate output that help guide the model to what's ideally a higher-quality final answer.

Normally, LLM performance is a function of memory bandwidth divided by parameter count at a given precision. Theoretically, if you've got 3.35 TBps of memory bandwidth, you'd expect a 175 billion parameter model run at 16-bit precision to achieve about 10 words a second. Fast enough to spew about 250 words in under 30 seconds. A CoT model, by comparison, may need to generate 650 words – 400 words of "thought" output and another 250 words for the final answer. Unless you have 2.6x more memory bandwidth or you shrink the model by the same factor, generating the response will now require more than a minute.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

Did DeepSeek steal from OpenA?

OpenAI: stealing data for training is ok

DeepSeek: Ok

Open AI: Wait! Not like that!

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

“Torrenting from a corporate laptop doesn’t feel right”: Meta emails unsealed

Newly unsealed emails allegedly provide the "most damning evidence" yet against Meta in a copyright case raised by book authors alleging that Meta illegally trained its AI models on pirated books.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

Janus-Pro Image Generator (testbed) builds on the previous Janus model. Its results are intended to roughly match the capabilities of OpenAI's Dall-E, except that for now it generates images with a resolution of only 384 × 384 pixels. Whereas the Dall-E provides a resolution of 1 024 × 1 024 pixels in its default setting. See also:

DeepSeek Drops Janus Pro - Vision AND Image Gen In ONE Model

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

How Chinese A.I. Start-Up DeepSeek Is Competing With Silicon Valley Giants (Gift Article) The company built a cheaper, competitive chatbot with fewer high-end computer chips than U.S. behemoths like Google and OpenAI, showing the limits of overpriced A.I. dedicated chips. See also:

How DeepSeek R1 works on OLD Nvidia chips? The development of proprietary extremely fast network cluster software in assembly language instead of CUDA had lion share on Chinese success.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

DeepSeek, which decided to use PTX (Parallel Thread Execution), a low-level language developed by NVIDIA, instead of relying directly on CUDA. This strategy allowed them to achieve remarkable efficiencies without falling victim to US trade restrictions... but only because they don't apply to NVIDIA's older (PTX-compatible) hardware, not because PTX itself is free (it's just as proprietary as CUDA). Thus, although the use of PTX shows that there are viable alternatives within NVIDIA's own ecosystem, experts warn that this solution is still not a sufficient guarantee of China's technological independence: it is a technology developed by the same company that controls CUDA, so the risk of external constraints would still hang over the head of the Chinese industry.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

The Incredible Physics Behind AI Images: How DALL-E, Imagen, and MidJourney Work The equations describe dynamic movement and forces causing movement . These can in turn be described in the forms of potential and kinetic energy and setting the total energy as always equal to 1. The “reversing” of the entropy therefore would have a bit by bit acceleration (+ &-) added to the existing shifting of bits depending on if you assume a potential or a kinetic energy is moving the bits and always limiting the total summed energy to 1. Deciding if Potential or kinetic energy apply without references to historical data can only be assumed by bit distribution.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

JD Vance Puts European Leaders On Notice About Trying To Regulate U.S. Tech Giants

The Trump administrative would hardly restrain dystopian applications of A.I. technologies more than globalist NWO of progressives...

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

$1 million dollar prize announced for unsolved Indus script

A.I. would communicate in ancient script without problems after being trained on sufficiently large sample of it - but also without any understanding it (and ability to translate it into another languages). I guess this example demonstrates best what artificial intelligence actually is in its present state of development.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

Study finds that ChatGPT, one of the world’s most popular conversational AI systems, tends to lean toward left-wing political views. The system not only produces more left-leaning text and images but also often refuses to generate content that presents conservative perspectives.

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

[–]ZephirAWT[S] 1 insightful - 1 fun1 insightful - 0 fun2 insightful - 1 fun -  (0 children)

DeepSeek’s revolutionary AI model has wiped nearly $100 billion from the world’s richest people, including Mark Zuckerberg, Larry Ellison, and Elon Musk.

Like what they actually expected? A.I. is technological hype and LLM's are all just about brute computational power which no one can already compete with China with it.