low-level / hardware-focused software engineer

Artur Bieniek Tools Team, Antmicro · Wrocław, Poland

Artur Bieniek

I work close to the boundary between software and hardware: Verilator tooling, performance-sensitive C++, FPGA and digital design flows, CUDA kernels, Linux systems, and practical hardware debugging.

QOTD You can't communicate complexity, only an awareness of it. -- Alan J. Perlis

$ ./artur --focus Verilator, FPGA, CUDA, Linux systems

$ git log -1 fix simulator edge cases without giving up performance

$ grep debug work.log debug hardware, measure bottlenecks, verify behavior

[STAT] VISITS: 000221 | TIME: 2026-06-01 14:12:22 Europe/Warsaw arturbieniek.dev
GitHub LinkedIn ar2rekb [at] gmail.com

current work

Systems that need to be correct and fast.

My work is centered on low-level tooling, simulation behavior, and workflows where small correctness or performance mistakes have a large blast radius.

Antmicro · Tools Team

Jun 2025 - present

Developing C++ tooling around Verilator and topwrap for digital design, simulation, and verification workflows.

  • Implemented full support for force/release while preserving simulation performance (verilator/verilator#7391).
  • Extended Verilator functionality to support UVM 2017 flows without user-side workarounds.
  • Work on performance-sensitive simulation infrastructure and hardware verification tooling.

Independent hardware work

ongoing

Hands-on diagnostics, repair, stability testing, and custom system building for PCs, servers, GPUs, RAM, and Linux-based setups.

  • Debug hardware faults from symptoms through stress testing and validation.
  • Build and maintain custom compute and server systems.
  • Combine software-level instrumentation with physical hardware diagnosis.

selected work

Projects with measurable constraints.

A small set of work that reflects the same pattern: understand the system, find the bottleneck, and make the implementation match the hardware.

SHA3x CUDA hasher for Tari

CUDA-based SHA3x hasher reaching 700 MH/s on an RTX 4070 Super by tuning kernel structure and memory access patterns for throughput.

CUDA C/C++ RTX 4070 Super

SHA3x FPGA hasher

FPGA implementation of a SHA3x hasher achieving 33 MH/s on a DE1-SoC, using a multicycle design to reduce critical path pressure.

FPGA SystemVerilog DE1-SoC

Low-latency network proxy and analyzer

TCP/UDP proxy for real-time packet inspection and modification, with binary protocol reverse engineering and live application-layer payload manipulation.

C/C++ Python TCP/UDP

ARM phone compute cluster

Built a low-cost distributed compute cluster from more than 80 smartphones with 8-core ARM CPUs, Linux tooling, and custom maintenance workflows.

Linux ARM systems distributed compute

open source

Recent public work.

Public GitHub activity reflects a current emphasis on Verilator internals, simulation correctness, and practical tooling.

Verilator contributions

Recent merged work includes fixes around property expression literal width, virtual interface method scheduling, and assign/deassign behavior.

2026 active upstream PR work
C++ simulation internals

sha3x_cudaminer

Public CUDA mining implementation and performance-oriented hashing work, archived after the core experiment was completed.

CUDA hashing optimization

BTS2026 CTF writeups

Public writeups from Break The Syntax 2026, showing applied debugging, exploit reasoning, and careful technical communication.

CTF debugging writeups

toolbox

Low-level by default.

The useful overlap: systems programming, digital design, hardware debugging, and enough infrastructure work to keep the whole path observable.

Languages

C++ C SystemVerilog Python

Systems

Linux TCP/UDP ARM systems server hardware

Hardware and embedded

FPGA Raspberry Pi ESP32 ATMega32

Tools

Verilator topwrap CUDA GDB Wireshark Valgrind ASAN Git

education

University of Wrocław.

M.Sc. in Computer Science

Feb 2026 - Feb 2027 expected

Planned thesis topic: reducing DRAM refresh rates by identifying and mitigating leaky cells.

B.Eng. in Computer Science

Sep 2023 - Feb 2026

Completed in 2.5 years with grade 4.31/5.0. Thesis: implementing full support for force/release in Verilator.