The FEX release FEX-2508 includes several changes to improve the performance of Arm64 Wine. These changes involve getting rid of pair usage in various components like allocator, ARM64EC, ASIMDOps, SVEOps, ArchHelpers, Arm64, Arm64Emitter, CMake, CUID, ELFContainer, EmulatedFiles, Frontend, OpcodeDispatcher, FileManagement, IR, InstcountCI, JIT, LibraryForwarding, wayland, DecodeNZCVCondition, PoolBufferWithTimedRetirement, Profiler, and XXFileHash. Additionally, the release includes fixes for the thunk unittest path, the update vcvtsd2si test, instcountci, and x87StackOptimizationPass. The release also includes support for Ubuntu 25.04 and updates to the XXFileHash and XXFileHash.
FEX Release FEX-2508
Read the blog post at FEX-Emu's Site!
You thought we were done with optimizations? Too bad, we had some massive improvements this month! Let's jump in!
Big juicy JIT optimizations!
The improvements this month can't be understated for how much performance have been lifted. To start off, let's show off some performance graphs for a handful of games!
<---FPS% uplift--->
And a chart for the averaged FPS numbers recorded from each of these games.
<---FPS uplift--->As you can see from the tested games, the improvements can be wild depending on what the game is doing! A nearly 39% FPS uplift in Cyberpunk 2077 is
wacky! From various testing we have done, the uplift tends to be closer to Cyberpunk but there are of course other games like God Of War where the
uplift is minimal!The majority of this performance uplift has come from call-return stack optimizations, where we are now able to take advantage of the ARM CPU's own
call-return prediction hardware, but we have had a variety of optimizations this month that improve both JIT compilation time in addition to execution
time! Additionally now we compile significantly less code since we would have combinatorial explosion of JIT compiles when multiblock was enabled. We
have now made it so each individual block of JIT code is freestanding and usually only gets compiled once.Another improvement this month is the WINE wow64/arm64ec libraries can now take advantage of Apple Silicon's hardware TSO feature. This happened to
not be implemented with the wow64/arm64ec code path. This will significantly improve performance on that hardware in the case that someone spent the
effort to run a game in that environment.There's a few other JIT improvements but we could spend all day here if we talked about everything! Have fun gaming with the performance improvements!
Implement NX bit
This is a fun little security feature which prevents games from executing code that isn't mapped executable. This is a feature that has been around
for a long time in hardware, but FEX has finally implemented it! This fixed a single game that we know of, where it tests to ensure this security
feature is enabled. This usually isn't a problem for most games, but it is kind of funny that a game using NaCL didn't work because of it.More anti-debugger/tamper improvements
This month has also gotten a bunch of improvements around behaviour of code that only shows up in anti-tamper or debugger code. Specifically we found
that Peggle Deluxe and Crysis 2: Maximum Edition usually worked under FEX, but it relied on some subtle self-modifying code that happened to
work. These are either for anti-tamper, or anti-debugging, or maybe even a way to block data mining. We don't know for sure but since the game relies
on it, we just need to support those forms of self-modifying code.This may also happen to get some versions of Denuvo anti-tamper working under FEX, but it isn't guaranteed and will depending on the game and the
particular flavour of that anti-tamper software.Upload WINE DLL artifacts
This is a minor thing, but we are trying out uploading the WINE wow64/arm64ec DLL files for every commit. This can be found on our github actions
page. This is for people that want to tinker with the main branch under
arm64 wine. As usual we will recommend the official releases on our Launchpad PPA but adventerous
users always want more.Raw changes
FEX Release FEX-2508
64BitAllocator
- Removes pair usage in allocator ( e80270a)
ARM64EC
- Rely on syscall export sorting for deriving their IDs ( b2eeaf7)
ASIMDOps
Remove unused permute overloads ( 0457bdc)
Move remaining base opcodes into implementing function ( 7e54c27)
Move most remaining base opcodes into their implementing functions ( ba85dbf)
Constrain more instructions with IsQOrDRegister ( 318b311)
Move base opcode into ASIMDModifiedImm()/ASIMDShiftByImm() ( a190086)
Move base opcode into ASIMD2RegMisc() ( 23d0d7d)
Move base opcode into implementation function for some categories ( e49155f)
Merge half-float 3-reg same with single/double variants ( aec7dec)
Merge half-float 2-reg misc with single/double variants ( c1fe841)
SVEOps
- Use IsStandardFloatSize() even more ( 564e966)
ArchHelpers
- Remove pair usage in unaligned handler ( 493a7cc)
Arm64
- Remove pair usage in 128-bit loader ( cd56b85)
Arm64Emitter
- Fix signed overflow ( 170fbb5)
CMake
CPUID
- Update documentation comments ( b0c74e1)
ELFContainer
- Remove tuple usage ( 57c0d0f)
EmulatedFiles
- cpuinfo
- Add a few missing flags ( e71e10a)
External
FEXCore
Remove unused refcount_shared_mutex ( 90b9414)
Replace CustomIREntry tuple with struct ( 7877068)
Remove reference SHA implementation ( bffb812)
Reintroduce support for CSSC ( 5123ca5)
Implement a mostly NOP implementation of waitpkg ( 3456686)
Implement support for NX bit. ( 35e4ac5)
Implement xsaveopt ( 2a37289)
Frontend
- Ensure multiple prefix bytes work ( 265525e)
OpcodeDispatcher
FileManagement
- Remove pair usage from GetEmulatedFDPath ( c0762bb)
Frontend
IR
- Remove tuple usage from NodeIterator ( c0008af)
InstcountCI
- Fix bad encoded moffset instructions ( cf43e5e)
InvalidationTracker
- Fix queries in the non-intersecting RWX case ( 8103cdb)
JIT
- Add more padding ( f5efe1d)
LibraryForwarding
- wayland
- Add method signatures required by steam-runtime-launch-options ( fc1ca01)
OpcodeDispatcher
PoolBufferWithTimedRetirement
- Unclaim in dtor ( e669370)
Profiler
- Decouple profile stats from the profiler option ( 976b68f)
RegisterAllocationPass
- defer next-use analysis ( e6de17e)
SVEOps
- Make use of IsStandardFloatSize() more ( dd5c172)
SecondaryGroupTables
- Specify FLAGS_SF_MOD_REG_ONLY for more ops ( f44af48)
VectorOps
- Correct benign VFAddV op cast ( b72bdf9)
VixlUtils
- Fix null pointer dereference vector in IsImmLogical() ( ebd0a3c)
WOW64
- Wrap BTCpuSimulate to ensure correct unwinding ( d226331)
WinAPI
- Avoid sign-extension of processor count in GetSystemInfo ( 98abfa4)
Windows
XXFileHash
Misc
- Implement inline SMC handling for linux FEX ( a5260f4)
- Runtime mode switch take 2 ( 369ca5c)
- Pool constants as we go ( 123f8c9)
- Readme.md: Mention Ubuntu 25.04 as supported ( 53ff6d5)
- Implement call-ret stack optimisations ( bf8b5ed)
- Implement x87 invalid operation bit on F80 mode ( 525462e)
- Inline SMC fixes and support for WOW64 ( 507b3a6)
- External/code-format-helper: Update requirements ( 3387f51)
- Add --changed flag to reformat.sh script ( 251ac14)
- Whole-tree reformat with clang-format-19 ( e17c2c8)
- Disable GCS in simulator and userspace ( 1d4b6c6)
- Ensure .clang-format-ignore is compatible with clang-format-19 ( befae52)
- Drop unnecessary uses of maybe_unused ( 5b0e703)
- Miscellaneous log changes ( 05742a8)
- code-format-helper: Fix formatting for the format helper ( 88f30af)
- Drop TODO defines ( e6f319d)
- Set clang-format-19 as the version git-clang-format should run in wor… ( 70ce323)
- Support multiple entrypoints into a multiblock and executable permission tracking ( 2e7b86e)
- ( 5d4bddc)
- Upgrade to clang-format-19 ( 67bfd38)
- Fix divisor masking ( d8a4d03)
cmake
- Move legacy binfmt arch-specific targets to combined ( ac42034)
github
- Upload armtifacts ( c6733a6)
unittests
x87StackOptimizationPass
- Removes pair usage ( 5db494d)
