Boyang Yue
Software Engineering, Big Data, and the miscellaneous
Home
Tags
03
May 2025
Python Distributions, Native Dependencies, and Environment Boundaries
2,133 words, ~10 min read
Dependency Management
Reproducibility
Python
uv
conda
How dependency shape separates uv from conda.
06
Aug 2023
Why Experiment Wins Underdeliver
4,211 words, ~21 min read
Data Science
A/B Testing
Statistics
Causal Inference
The launch gap between measured lift and real impact.
25
Feb 2023
Crop Out: Exploring Digital Image Processing
913 words, ~4 min read
Image Processing
Pillow
Python
Trick
To process images programmatically.
18
Dec 2022
Efficient Similarity Search with FAISS
1,875 words, ~9 min read
FAISS
Apache Spark
NumPy
Performance Optimization
Similarity Search
Unlock the full potential of large-scale vector searches.
11
Sep 2022
Two Measures of Fast: Throughput and Latency
2,738 words, ~13 min read
Performance Optimization
System Design
Benchmarking
Where the tradeoff is a choice, where it is arithmetic, and where there is none.
16
Jan 2022
The Comprehensive Guide to Hive UDF
2,534 words, ~12 min read
Apache Hive
Apache Spark
UDF
Java
A guide to developing Hive UDFs and integrating them with Spark SQL.
08
Oct 2021
When the Python Interpreter Is the Bottleneck
1,833 words, ~9 min read
Hybrid Programming
Performance Optimization
Python
C++
cppimport
Compile a small C++ kernel on import after profiling identifies the hot loop.
15
Jun 2021
From MapReduce to Spark: Execution and Programming Models
2,764 words, ~13 min read
Data Engineering
Distributed Systems
Apache Spark
Apache Hadoop
MapReduce
Job boundaries, persistence, scheduling, and SQL optimization.
30
Nov 2020
Recursion, Iteration, and the Hidden Stack
3,313 words, ~16 min read
Theory of Computation
Functional Programming
Haskell
Python
Accumulators, continuations, trampolines, and conversion costs.
27
Jun 2020
Markdown Syntax in a Nutshell
1,808 words, ~9 min read
Markdown
HTML
Cheat Sheet
A concise tutorial on the most common syntax of Markdown.