Pros
- Ferociously powerful for a single-GPU card
- Power consumption is relatively low for this level of raw GPU performance
- Usual exceptional Founders Edition build quality
Cons
- Pricey
- Almost impractically enormous
- Raw power appears, at times, to bottleneck a Core i9-12900K CPU
Nvidia GeForce RTX 4090 Founders Edition Specs
Graphics Processor | Nvidia AD102 |
GPU Base Clock | 2239 MHz |
GPU Boost Clock | 2520 MHz |
Graphics Memory Type | GDDR6X |
Graphics Memory Amount | 24 GB |
HDMI Outputs | 1 |
DisplayPort Outputs | 3 |
VirtualLink Outputs | |
Number of Fans | 2 |
Card Width | triple |
Card Length | 12 inches |
Board Power or TDP | 450 watts |
Power Connector(s) | 4 8-pin (12VHPWR) |
Nvidia has played a pivotal role in the PC graphics sector since its inception. Although its initial graphics accelerator did not achieve significant success, this instance was atypical; more frequently, the company’s new graphics cards deliver substantial performance improvements and advance the industry collectively. Therefore, when Nvidia asserts that “Ada (Lovelace) offers the most significant generational performance enhancement in Nvidia’s history,” it commands our attention.
Assessing the absolute accuracy of that assertion is challenging without a thorough examination of historical product offerings spanning several decades. However, after testing and reviewing our inaugural new graphics card based on the Ada Lovelace architecture, we can confidently suggest that it may indeed hold true. The performance enhancements associated with Lovelace are distinctly evident in Nvidia’s latest GeForce RTX 4090 Founders Edition: they are substantial, and if this is not the company’s most significant generational performance leap, it is certainly among the top. The card’s may appear exorbitant, yet it reflects the capabilities of this exceptional product. At present, no other card comes close, particularly for those engaging in relatively modern gaming at 4K resolution.
Ada Lovelace: Nvidia’s Blueprint for Achievement
The enhancements in performance associated with Nvidia’s GeForce RTX 4090 primarily stem from three essential elements: architecture, hardware capabilities, and the manufacturing technique. We will begin by examining the first aspect and reviewing the fundamental components that drive the GeForce RTX 4090.
At a broad overview, the Ada Lovelace architecture appears quite similar to the preceding “Ampere” architecture utilized in the GeForce RTX 3000 series. The internal hardware resources of Ada Lovelace GPUs are structured into Graphics Processing Clusters (GPCs), each of which comprises six Texture Processing Clusters (TPCs), and each TPC contains two Streaming Multiprocessors (SMs).
The Streaming Multiprocessors (SMs) encompass a variety of functional blocks, featuring 128 CUDA cores per unit, in addition to 128K of L1 cache, four Texture Mapping Units (TMUs), and four Tensor cores. Furthermore, each Ada Lovelace SM is equipped with a third-generation Ada Lovelace Ray-Tracing (RT) core. The RT cores have been enhanced with new, fixed-function hardware, which will be discussed in more detail shortly.
In addition to the streaming multiprocessors (SMs), there are several other significant hardware resources located on the GPU die. These resources encompass the raster operations pipelines (ROPs), the L2 cache, memory controllers, media encoders and decoders, along with various other components. However, the majority of the processing capabilities are concentrated within the SMs.
Third-Generation Ray Tracing Technology
From an architectural perspective, the most notable enhancements introduced by Nvidia in the Ada Lovelace architecture pertain to the RT cores. Two new fixed-function hardware components within the Ada Lovelace RT cores are designed to enhance ray-tracing performance while alleviating the burden on other elements, including the CUDA cores.
The initial component is referred to by Nvidia as the Opacity Micromap Engine, which is responsible for alpha-testing geometry. Nvidia has provided the diagram below to demonstrate the functionality of this engine. On either side of the diagram, there is an illustration of a leaf. The GPU must assess the impact of a ray-traced light ray on the image. In the case of an Ampere RT core, the entire image would require processing to ascertain the effects; however, the Opacity Micromap Engine significantly diminishes this workload.
In the map, white tiles are designated as transparent, which means that no lighting effects are necessary for these areas. Conversely, the green tiles are completely opaque, prompting the GPU to engage in lighting-related processing for these sections. The brown and blue tiles present a combination of characteristics, necessitating additional calculations. This computational task is delegated to the shader cores, which execute code to ascertain the effects on the remainder of the image. In this scenario, the workload assigned to the shaders is reduced by over fifty percent; however, this reduction may fluctuate based on the specific image. Overall, Nvidia has indicated that the performance of scene traversal in applications utilizing alpha-tested geometry has doubled, thanks to the Opacity Micromap Engine.
The second innovative hardware component is the noteworthy Displaced Micro-Mesh Engine, engineered to mitigate the performance drawbacks linked to the construction and storage of bounding volume hierarchies (BVH). BVHs, which are essential for handling geometrically intricate objects, represent a significant burden in graphics processing tasks. Nvidia highlights that a hundred-fold escalation in geometry could lead to a corresponding hundred-fold rise in BVH construction time, consequently resulting in a similar increase in memory consumption. This challenge is expected to continue as ray-traced workloads become more demanding.
The Displaced Micro-Mesh Engine is designed to enhance the efficiency of Bounding Volume Hierarchies (BVHs) creation while minimizing their memory usage; however, utilizing this service incurs certain drawbacks. Nvidia has observed a slight distortion in images, likely attributable to data compression techniques. Consequently, the Displaced Micro-Mesh Engine is not employed for rasterized graphics tasks, where the BVH-related issues are less significant. Considering the visual advantages offered by ray tracing, we believe that the merits of adopting this technology surpass its disadvantages. Nevertheless, we have not yet detected any noticeable differences, and this remains an uncertain aspect.
The most significant modification introduced by Nvidia to its third-generation RT hardware involves the adoption of an innovative method for reorganizing ray-traced workloads. This approach effectively consolidates similar ray-traced tasks, allowing for more efficient execution within the processing pipeline. Known as the Shader Execution Reordering Pipeline (SER), this technique is said to enhance performance by as much as 44%, contingent upon specific circumstances.
Custom Fabrication by TSMC: Introducing the AD102 GPU.
Nvidia is reestablishing its collaboration with TSMC for the production of its GPU cores, specifically with the Ada Lovelace architecture. Although Nvidia and TSMC have maintained a longstanding partnership, the company previously chose Samsung for the fabrication of its Ampere GPUs, marking a departure from their usual practice.
It appears that Nvidia and TSMC have reached a new agreement. Nvidia is utilizing a novel 4nm manufacturing process from TSMC within its Lovelace architecture, which has reportedly been specifically designed for Nvidia’s GPUs. Transitioning to this new process typically results in lower power consumption and costs, as well as enhancements in clock speed. Additionally, it allows for a greater allocation of resources within each square millimeter of silicon.
The GeForce RTX 4090 is equipped with an AD102 GPU core that features a total of 12 GPCs, although one of these GPCs is currently disabled. This situation suggests the potential for a future Nvidia GeForce RTX 4090 Ti. At present, the RTX 4090 boasts 16,384 operational CUDA cores out of a possible 18,432. This represents an increase of 5,632 cores compared to the Nvidia GeForce RTX 3090 Ti. It is evident that a significant portion of the performance enhancement of the RTX 4090 can be attributed to this substantial rise in core count.
The memory bandwidth remained unchanged from the RTX 3090 Ti to the RTX 4090, which is rather atypical. Both graphics cards feature a 384-bit memory interface and are equipped with 24GB of GDDR6X RAM. However, Nvidia has made substantial enhancements to the caches within the AD102 chip, which is expected to enhance performance significantly. Specifically, the RTX 4090 is outfitted with 16MB of L1 cache and 72MB of L2 cache, in contrast to the RTX 3090 Ti, which has a comparatively modest 10.5MB of L1 cache and 6MB of L2 cache.
The substantial L2 cache provides insight into another characteristic of the AD102 GPU die: it boasts an impressive transistor count of 76.3 billion, significantly surpassing the GA102’s transistor count of 28.3 billion, which was once considered substantial but is now relatively small. While core counts and clock speeds are frequently emphasized as metrics for evaluating chip performance, transistor count can often serve as a more revealing measure of the distinctions among different products. This is clearly demonstrated in this case. Additionally, it is noteworthy that this achievement has been realized on a GPU die that is even smaller, which should facilitate easier and more cost-effective manufacturing.
An Overview of the RTX 4090 Founders Edition Graphics Card.
Prior to discussing the test results, it is important to highlight several less prominent features of the new Ada Lovelace GPUs. Notably, these GPUs are equipped with advanced eighth-generation multimedia hardware, referred to as NVENC, which offers comprehensive support for AV1 encoding and decoding. This advancement allows for the encoding of content in AV1 format, boasting a reported 40% increase in efficiency when compared to H.264 encoding on Ampere-based GPUs. Furthermore, Nvidia’s GeForce RTX 40-series GPUs, which possess 12GB of memory or more, including the RTX 4090, are designed with two NVENC encoders to enhance performance and facilitate the encoding of 8K/60Hz content.
It is essential to emphasize the remarkable size of this graphics card. The Nvidia GeForce RTX 4090 Founders Edition significantly surpasses all other Founders Edition cards we have encountered, both in terms of width and length.
The dimensions of the card are nearly 12 inches in length (304mm) and 5.4 inches in width (137mm), occupying a total of three PCI Express expansion card slots and having a thickness of 2.4 inches (61mm). The image illustrates the extent of the backplane allocated for exhaust purposes.
It is evident that there are three DisplayPorts available along with one HDMI port.
The card incorporates the new 12VHPWR power connector and includes an adapter that allows for the connection of four eight-pin PCI Express power connectors to the card. However, it is important to note that not all of these connections are necessary for operation.
An Overview of DLSS 3
A significant focus of Nvidia’s development efforts, in conjunction with the launch of its RTX 40-series GPUs, has been the advancement of a new iteration of Deep Learning Super Sampling, referred to as DLSS 3. Similar to its predecessors, DLSS 3 utilizes deep learning techniques to enhance content rendering; however, its implementation in games will not be widespread due to this dependency. The operational mechanics of DLSS 3, however, differ considerably from earlier versions.
To some degree, the functionality of DLSS 3 parallels that of various other devices, such as smart TVs and smartphones; however, DLSS 3 significantly outperforms them. The core operation of DLSS 3 involves generating new frames that exist between two pre-existing frames. Initially, the GPU produces Frame 1, followed by Frame 2, and subsequently examines the differences between these two frames. A new frame is then generated at a midpoint between the two and inserted accordingly.
Consider a television program featuring an individual in motion. In the first frame, a man is depicted with one foot raised. In the second frame, the same man is shown with his foot firmly on the ground, indicating that he has just completed a step. Subsequently, the graphics processing unit generates an intermediate frame that illustrates the man with his foot still elevated, yet positioned lower than in the first frame, and inserts this frame between the two previously mentioned.
The outcome of this feature is enhanced motion smoothness. However, it may also lead to increased latency and a reduction in image quality. Numerous televisions offer a comparable feature specifically designed for sports broadcasts; similarly, VLC Player includes a function known as Yadif (2x), which enables the playback of 24 frames per second (fps) content at 48 fps and 30 fps content at 60 fps. Another multimedia application, Smooth Video Player (SVP), is a more resource-demanding option that aims to achieve the same effect. Additionally, utilizing a standalone device such as a GBS-C or RetroTink can yield similar results with any graphics card.\
A significant distinction among these technologies lies in their effectiveness. For instance, the Yadif (2x) option available in VLC Player is relatively efficient but may not consistently yield optimal outcomes. I have utilized SVP for an extended period due to its superior results; however, it requires substantially more system resources to operate. While it does offer a performance adjustment feature, this also impacts the quality of the image.
Nvidia has effectively harnessed the capabilities of the AD102 GPU die to mitigate the potential drawbacks associated with this technology, all while achieving a significant enhancement in performance. During our informal testing, we examined several games that utilize DLSS 3 and did not observe any significant ghosting or other graphical anomalies typically associated with frame-doubling technologies. We noted a considerable increase in FPS across most titles we tested; however, due to time constraints, we were unable to evaluate our complete range of graphics cards against all of these new games for a comprehensive comparison. Consequently, the findings from our DLSS 3 testing are presented below.
Evaluating the GeForce RTX 4090 Founders Edition: Unleashing the Powerhouse!
The Nvidia GeForce RTX 4090 underwent testing on our updated GPU test bed for 2022. It is paired with a standard Intel Core i9-12900K processor, utilizing a 240mm Corsair Hydro Series H100X water cooling system, and is installed on an Asus ROG Maximus Z690 Hero motherboard.
The configuration includes 32GB of Corsair Vengeance RAM operating at a frequency of 5,600MHz, a 1TB Corsair MP600 Pro NVMe 4.0 SSD, and a Corsair HX1500i power supply rated at 1,500 watts with an 80 Plus Platinum certification. All evaluations were conducted on Windows 11 Pro, equipped with the most recent updates.
Presented below is a summary of the cards we evaluated in comparison, along with links to the original reviews.
Synthetic Assessments (Elevating Performance), Alongside DLSS/FSR.
The increase in performance of the RTX 4090 is remarkable, particularly highlighted by our series of synthetic tests; however, the results from real-world gaming tests are also noteworthy, as will be demonstrated shortly.
Even Nvidia’s own assertions regarding performance do not reflect a seven-fold enhancement compared to the previous generation card in FurMark. It is probable that these evaluations are demonstrating the maximum potential achievable under optimal conditions.
In titles that are compatible with DLSS 2 and FSR, the RTX 4090 exhibited varied outcomes regarding the extent to which DLSS contributed to the performance of this graphics card.
The RTX 4090 significantly outperforms all other tested graphics cards, particularly in the game F1 22. However, it did not experience the same level of improvement from DLSS as other models. This discrepancy may be attributed to early driver issues, but there is also a possibility that CPU bottlenecking played a role. While this notion is somewhat difficult to accept, especially considering that the Intel Core i9-12900K is among the fastest gaming CPUs available, we did observe indications of this phenomenon in several games.
The Core Issue: Evaluating Performance with Contemporary Video Games
In contemporary AAA games that do not implement DLSS or FSR, the RTX 4090 consistently outperforms all competitors. It often executes games at a resolution of 2,560 by 1,440 (1440p) or even at 4K, achieving speeds that surpass those of several other premium GPUs running the same titles at 1080p.
AAA Game Quality Assurance
It is important to highlight that the previously discussed concern regarding CPU bottlenecking with the Core i9-12900K is apparent in certain instances. This is particularly noticeable in Far Cry 5, where the RTX 4090 exhibits the least performance advantage, showing comparable results at 4K and 1080p resolutions. Similarly, in Red Dead Redemption 2 and Shadow of the Tomb Raider, the performance scores of the RTX 4090 at 1080p and 1440p are closely aligned, further indicating the presence of a CPU bottleneck.
There are alternative explanations for the phenomena we are observing. Certain game engines may not perform optimally beyond a specific threshold, and, as previously noted, initial driver complications could also contribute to this situation. However, given that three distinct games from three separate developers exhibit the same problem, it seems that a CPU bottleneck is the most probable explanation.
A Retrospective: Evaluating a Selection of Classic Games
A compelling argument suggesting a potential driver issue can be observed in the performance of the RTX 4090 with older game titles. Overall, the RTX 4090 did not exhibit strong performance, particularly when evaluated in our legacy game assessments. This could indicate a distinct problem that is separate from the performance observed in more contemporary games. Regardless, it is evident that Nvidia should enhance its driver support for older gaming titles.
Legacy Games Evaluation
Among the limited number of games where the RTX 4090 did not significantly outperform its competitors was Bioshock: Infinite. In this instance, the GeForce RTX 3090 and RTX 3080 generally exhibited superior performance. The situation was similarly unimpressive in Hitman: Absolution, where a noticeable bottleneck was evident; however, the RTX 4090 was able to execute this game at 1440p with the same smoothness as at 1080p. It excelled at 4K resolution across all three titles mentioned, though the advantage was not particularly substantial.
An Examination of Performance and Thermal Management
The maximum full-system power consumption recorded with the RTX 4090 was 485 watts, as measured by our Kill-A-Watt meter. This figure is comparable to the readings obtained with the GeForce RTX 3070 Ti and significantly lower than those of the GeForce RTX 3090 and RTX 3080. Although the power consumption did not match the lower levels of competing AMD cards, the performance metrics suggest that the RTX 4090 offers a considerably superior performance-per-watt ratio compared to all other models listed, with the possible exception of the notably energy-efficient Radeon RX 6600.
Considering the indications of CPU bottlenecks observed during our testing, it is plausible that the RTX 4090 may draw more power than what is currently indicated. As previously stated, the graphics card is designed to consume 450 watts, and it has the potential to exceed this limit significantly. However, in the tests we conducted, this increase in power consumption did not occur.
The Nvidia GeForce RTX 4090 maintained a reasonable temperature during our extended FurMark test, reaching a maximum of 63 degrees Celsius, which is quite commendable.
The Conclusion: Unrefined Strength, Provided You Can Accommodate It.
The immense capabilities of the Nvidia GeForce RTX 4090 are truly remarkable. At present, there is no competitor in the market that can match its performance. It is also quite astonishing to consider that Nvidia might produce an even more formidable GeForce RTX 4090 Ti utilizing this GPU architecture. However, it is premature to delve into that topic.
This graphics card is designed to deliver exceptional gaming performance at 4K resolution, with the added capability to support future 8K gaming. If high-end gaming is your primary focus and budget is not a concern, we see no compelling reason to advise against purchasing the GeForce RTX 4090. it is comparatively more economical than the Nvidia GeForce RTX 3090 Ti, and it is expected to handle gaming demands effectively for several years ahead.
Ensure that your computer is capable of accommodating the requirements of this component, as its substantial power consumption and considerable physical size can be quite challenging. It may be necessary to consider additional components, such as a larger case and a new power supply unit, to support your high-performance graphics card.