cpus

cover image

Most people think of machine instructions as the fundamental steps that a computer performs. However, many processors have another layer of ...

cover image

Hello you fine Internet folks,

cover image

In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line. I've been examining the Pentium'...

cover image

It is often said that companies – particularly large companies with enormous IT budgets – do not buy products, they buy roadmaps. No one wants to go to

cover image

Intel released the powerful Pentium processor in 1993, establishing a long-running brand of high-performance processors. 1 The Pentium incl...

cover image

After persistent rumors refused to recede, AMD steps in with a clear explanation why dual-CCD V-Cache doesn't exist.

cover image

In 1993, Intel released the high-performance Pentium processor, the start of the long-running Pentium line. The Pentium had many improvement...

cover image

The CCD stack with 3D V-Cache on the AMD Ryzen 7 9800X3D is only 40-45µm in total, but the rest of the layers add up to a whopping 750µm.

cover image

I was studying the silicon die of the Pentium processor and noticed some puzzling structures where signal lines were connected to the silico...

cover image

A loop buffer sits at a CPU's frontend, where it holds a small number of previously fetched instructions.

cover image

Intel was a dominant leader in the CPU market for the better part of a decade, but AMD has seen massive success in recent years thanks to its Ryzen chips.

cover image

Nitro, Graviton, EFA, Inferentia, Trainium, Nvidia Cloud, Microsoft Azure, Google Cloud, Oracle Cloud, Handicapping Infrastructure, AI As A Service, Enterprise Automation, Meta, Coreweave, TCO

cover image

No matter how elegant and clever the design is for a compute engine, the difficulty and cost of moving existing – and sometimes very old – code from the

cover image

Intel’s Meteor Lake chip signaled a change in Intel’s mobile strategy, moving away from the monolithic designs that had characterized Intel’s client designs for more than a decade.

cover image

Intel's Core Ultra 200 "Arrow Lake" Desktop CPU specifications have now been finalized and we are just a month away from the official launch.

cover image

There are many chip partitioning and placement tradeoffs when comparing top-tier smartphone processor designs.

cover image

When I recently interviewed Mike Clark, he told me, “…you’ll see the actual foundational lift play out in the future on Zen 6, even though it was really Zen 5 that set the table for that.” And at that same Zen 5 architecture event, AMD’s Chief Technology Officer Mark Papermaster said, “Zen 5 is a ground-up redesign of the Zen architecture,” which has brought numerous and impactful changes to the design of the core.

cover image

Server CPUs have pushed high core counts for a long time, though they way they got high core counts has varied.

cover image

A Finnish startup called Flow Computing is making one of the wildest claims ever heard in silicon engineering: by adding its proprietary companion chip,

cover image

192 cores, 385 threads, socket compatibility. What's not to like?

cover image

Anton Shilov reports via Tom's Hardware: About half of the processors packaged in Russia are defective. This has prompted Baikal Electronics, a Russian processor developer, to expand the number of packaging partners in the country, according to a report in Vedomosti, a Russian-language business dai...

cover image

Researchers also disclosed a separate bug called “Inception” for newer AMD CPUs.

cover image

Downfall attacks targets a critical weakness found in billions of modern processors used in personal and cloud computers.

cover image

In this article we will learn about its definition, differences and how to calculate FLOPs and MACs using Python packages.

cover image

The company’s PowerVia interconnect tech demonstrated a 6 percent performance gain

cover image

GPUs may dominate, but CPUs could be perfect for smaller AI models

cover image

Tech enthusiasts probably know ARM as a company that develops reasonably performant CPU architectures with a focus on power efficiency.

cover image

Over the past 10-15 years, per-core throughput increases have slowed, and in response CPU designers have scaled up core counts and socket counts to continue increasing performance across generations of new CPU models.

cover image

While programmers today take division for granted, most microprocessors in the 1970s could only add and subtract — division required a sl...

cover image

While microprocessors are used in various applications, they are precluded from the use in high-energy physics applications due to the harsh radiation present. To overcome this limitation a...

cover image

In the march to more capable, faster, smaller, and lower…

cover image

Sponsored Feature: Training an AI model takes an enormous amount of compute capacity coupled with high bandwidth memory. Because the model training can be

cover image

It was only a matter of time, perhaps, but the skyrocketing costs of designing chips is colliding with the ever-increasing need for performance,

cover image

Chinese chip designer Loongson, which has tried to reduce the country’s reliance on Intel and AMD, is developing its own general-purpose GPU despite being added to a US trade blacklist.

cover image

How to considerable reduce training time changing only 1 line of code

cover image

Just one instruction at a time!

cover image

If a few cores are good, then a lot of cores ought to be better. But when it comes to HPC this isn’t always the case, despite what the Top500 ranking –

cover image

The groundbreaking 8086 microprocessor was introduced by Intel in 1978 and led to the x86 architecture that still dominates desktop and se...

cover image

Absolute Reticle Limit

cover image

Both companies are rolling out mitigations, but they add overhead of 12 to 28 percent.

cover image

Hertzbleed attack targets power-conservation feature found on virtually all modern CPUs.

cover image

HPC-oriented Latency Numbers Every Programmer Should Know · GitHub

cover image

In 2004 I was working for Microsoft in the Xbox group, and a new console was being created. I got a copy of the detailed descriptions of the Xbox 360 CPU and I read it through multiple times and su…

cover image

Nallatech doesn't make FPGAs, but it does have several decades of experience turning FPGAs into devices and systems that companies can deploy to solve

cover image

In this work, we analyze the performance of neural networks on a variety of heterogenous platforms. We strive to find the best platform in terms of raw benchmark performance, performance per watt a…

cover image

An accelerator unit improves both the performance and efficiency of a system by taking over one simple task

cover image

The CORE-V CVA6 is an Application class 6-stage RISC-V CPU capable of booting Linux - openhwgroup/cva6

cirosantilli/x86-bare-metal-examples: Dozens of minimal operating systems to learn x86 system programming. Tested on Ubuntu 17.10 host in QEMU 2.10 and real hardware. Userland cheat at: https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly ARM baremetal setup at: https://github.com/cirosantilli/linux-kernel-module-cheat#baremetal-setup 学习x86系统编程的数十个最小操作系统。 已在QEMU 2.10中的Ubuntu 17.10主机和真实硬件上进行了测试。 Userland作弊网址:https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly ARM裸机安装程序位于:https://github.com/cirosantilli/linux-kernel-module-cheat#baremetal- 设置 21世纪新政宣言(2020年4月5曰笫四次修改稿)(2020年6月19曰第七次修改,以下“【】”内文字为非正文内容的说明)20世纪苏联的消亡和东欧的大变革,使这21世纪初的现中国大陆成为世界关注的最主要焦点和影响新世纪文明发展的关键。特别是大陆这些年对外意识形态渗透,震撼整个世界。美中贸易战实际已打响人类意识形态领域最后的冷战,海峡两岸关系恶化,香港不断的百万人游行,南海邻国关系紧张。大陆经济急速下滑衰退,内外矛盾激化高端深感前所未有的生存危机。包括中共上下在内的几乎所有人都很清楚,大陆已到非政治体制改革而不可的时候了,大变革将是民意世潮下的必然结局。中国大陆内外即全球正合力促成这人口第一大国的大变革,这也为中国开创新政提供了一次最佳机会。综合各政体和各国现实,绝大多数国家改革选择了西方民主政体,但其固有的越来越明显的缺陷已成为有人攻击、拒绝或怀疑的理由。这也是近年来西方国家出现了宽容那必将灭亡...
cover image

Dozens of minimal operating systems to learn x86 system programming. Tested on Ubuntu 17.10 host in QEMU 2.10 and real hardware. Userland cheat at: https://github.com/cirosantilli/linux-kernel-modu...

cover image

A detailed, critical, technical essay on upcoming CPU architectures.

Software optimization manuals for C++ and assembly code. Intel and AMD x86 microprocessors. Windows, Linux, BSD, Mac OS X. 16, 32 and 64 bit systems. Detailed descriptions of microarchitectures.

cover image

There are some features in any architecture that are essential, foundational, and non-negotiable. Right up to the moment that some clever architect shows

cover image

AMD recently unveiled 3D V-Cache, their first 3D-stacked technology-based product. Leapfrogging contemporary 3D bonding technologies, AMD jumped directly into advanced packaging with direct bonding and an order of magnitude higher wire density.

cover image

Although competition from Arm is increasing, AMD remains Intel’s biggest competitor, as concerns of losing market share weigh on Intel’s valuation.

cover image

A new CPU design has won accolades for defeating the hacking efforts of nearly 600 experts during a DARPA challenge. Its approach could help us close side-channel vulnerabilities in the future.

cover image

Apple is positioning its M1 quite differently from any CPU Intel or AMD has released. The long-term impact on the PC market could be significant.

cover image

Sapphire Rapids, Intel's next server architecture, looks like a large leap over the just-launched Ice Lake SP.

cover image

Rice University computer scientists have demonstrated artificial intelligence (AI) software that runs on commodity processors and trains deep neural networks 15 times faster than platforms based on graphics ...

cover image

The “Milan” Epyc 7003 processors, the third generation of AMD’s revitalized server CPUs, is now in the field, and we await the entry of the “Ice Lake”

cover image

AMD is one of the oldest designers of large scale microprocessors and has been the subject of polarizing debate among technology enthusiasts for nearly 50 years. Its...

cover image

With every passing year, as AMD first talked about its plans to re-enter the server processor arena and give Intel some real, much needed, and very direct

cover image

Understanding Intel® processor names and numbers helps identify the best laptop, desktop, or mobile device CPU for your computing needs.

cover image

There’s something really quite subtle about how the nproc utility from GNU coreutils works. If you look at the man page, it’s even the very first sentence: Print the number of processin…

In this article, I would like to shortly describe the methods used to dump and restore the different kinds of registers on 32-bit and 64-bit x86 CPUs. The first part will focus on General Purpose Registers, Debug Registers and Floating-Point Registers up to the XMM registers provided by the SSE extension. I will explain how their values can be obtained via the ptrace(2) interface.

cover image

TamaGo - ARM/RISC-V bare metal Go.

cover image

They say "performance is king'... It was true a decade ago and it certainly is now. With more and mor...

cover image

When it comes to hashing, sometimes 64 bit is not enough, for example, because of birthday paradox — the hacker can iterate through random $latex 2^{32}$ entities and it can be proven that wi…

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

Fujitsu Limited today announced that it began shipping the supercomputer Fugaku, which is jointly developed with RIKEN and promoted by the Ministry of Education, Culture, Sports, Science and Technology with the aim of starting general operation between 2021 and 2022. The first machine to be shipped this time is one of the computer units of Fugaku, a supercomputer system comprised of over 150,000 high-performance CPUs connected together. Fujitsu will continue to deliver the units to RIKEN Center for Computational Science in Kobe, Japan, for installation and tuning.

cover image

Do the best you can until you know better. Then when you know better, do better. ― Maya Angelou

On the Linux command line it is fairly easy to use the perf command to measure number of floating point operations (or other performance metrics). (See for example this old blog post ) with this approach it is not easy to get a fine grained view of how different stages of processings within a single process. In this short note I describe how the python-papi package can be used to measure the FLOP requirements of any section of a Python program.

cover image

Recent leaks may shed some light on Intel's upcoming mainstream desktop Comet Lake-S CPUs.

cover image

Intel's Tremont CPU microarchitecture will be the foundation of a next-generation, low-power processors that target a wide variety of products across

A post describing how C programs get to the main function. Devicetree layouts, linker scripts, minimal C runtimes, GDB and QEMU, basic RISC-V assembly, and other topics are reviewed along the way.

Course overview Memory leaks and dangling pointers are the main issues of the manual memory management. You delete a parent node in a linked list, forgetting to delete all its children first -- and your

Excessive instruction cache misses are the kind of a performance problem that's going to appear only in larger codebases. In this article, I'm describing some ideas on how to deal with this issue.

Amp
cover image

Repository for the tools and non-commercial data used for the "Accelerator wall" paper. - PrincetonUniversity/accelerator-wall

cover image

Monday night Amazon announced the new 'A1' instance type for the Elastic Compute Cloud (EC2) that is powered by their own 'Graviton' ARMv8 processors.

cover image

It might have been difficult to see this happening a mere few years ago, but the National Nuclear Security Administration and one of its key

cover image

Every major tech company is looking at quantum computers as the next big breakthrough in computing. Teams at Google, Microsoft, Intel, IBM and various