徐浩

算法工程师、HPC

详细 PDF 简历

基本信息

性别：男
年龄：24 岁
联系方式：18913-028-323
邮箱：haoxu.live@outlook.com

教育经历

爱丁堡大学 (QS27)，英国爱丁堡

硕士，高性能计算与数据科学专业
2024.09 – 2025.08

主要课程：线程编程 (OpenMP)、高级消息传递编程 (MPI)、加速系统-原理与实践 (CUDA)、HPC 架构、性能编程、大规模机器学习、数据管理基础、实用软件开发。

萨塞克斯大学，英国布莱顿

本科，计算机科学与人工智能专业（2+2）
2022.09 – 2024.06

苏州大学 (211)，江苏苏州

本科，人工智能专业（2+2）
2019.09 – 2021.06

主要课程：数据库、机器学习基础、神经网络、计算机视觉、自然语言工程、程序分析。

个人简介

211 本科学历，QS27 硕士学历，人工智能和高性能计算专业科班背景。
本科阶段主要为机器学习、计算机视觉、自然语言工程等人工智能领域研究。
硕士阶段主要为计算机结构、并行计算、GPU 加速等高性能计算领域研究。
校内助教与实习经历，锻炼了较强的解决问题和沟通能力。
编程语言：Python, C/C++。
技术：Linux, LaTeX, Git, MySQL, MongoDB, OpenMP, MPI, CUDA, SYCL, Hadoop, Spark, PyTorch。
英国留学多年，英文可以作为工作语言。

工作经历

英国萨塞克斯大学 · 自然语言工程助教

助教 · 2023.10 – 2023.12，英国布莱顿

向本科二年级学生讲授自然语言工程课程的基本原理。
负责实验课教学，引导学生完成代码问题集。
帮助有困难的学生解答问题。

英国萨塞克斯大学 · 初级助理研究员

全职 · 2023.07 – 2023.09，英国布莱顿

使用 Cellpose 对细胞显微照片进行分割。
基于缺失通道的细胞显微照片尝试通过 VAE 生成完整的四通道图片。
对细胞周期阶段进行分类。

焦点科技有限公司 · 人工智能研发部

实习 · 2021.09 – 2022.06，江苏南京

使用 CNN 识别违规商品图片。
通过 YOLO 对敏感图标进行检测。
复现 Image Matting 细化分割边缘，提高商品图像清晰度。
训练 CLIP 模型参加阿里天池多模态商品检索竞赛。

南京迈煜创信息技术有限公司 · 数据分析实习生

实习 · 2021.06 – 2022.09，江苏南京

参与舆情分析系统的开发工作。
使用 Scrapy 抓取并预处理新闻数据。
训练 BERT 实现情感分析和文本分类。

项目经历

Vision Transformer 训练加速 · 爱丁堡大学

2025.03 – 2025.04

项目描述：在 Cirrus HPC 平台上对 ViT 训练做端到端性能剖析与优化。

性能剖析：使用 DLProf、Nsight Systems 对 CUDA/cuDNN 与 PyTorch 流程进行打点，定位 CPU/GPU 与内存瓶颈。
代码优化：启用 AMP、cudnn.benchmark，并将损失累积从 CPU 搬到 GPU，显著提升内核执行效率。
并行策略：对比 DP、DDP 和手动 Pipeline Parallel，在单节点多 GPU 与多进程场景下测试伸缩性。
超参调优：测试 Batch Size (16/32/64) 与 num_workers (0/2/4/8)，最终选定 BS=64、num_workers=4。
训练吞吐量从单 GPU FP32 的 ~6.2 samples/s 提升到 ~36 samples/s（4× 加速），且精度无损。

脑模拟系统的并行优化 · 爱丁堡大学

2025.03

项目描述：并行化串行脑模拟代码，基于事件驱动协调模式，使用 C/MPI 在多节点集群上实现分布式仿真。

设计并实现可复用的事件驱动协调框架，机制（事件队列、调度）与策略（神经模型行为）分离。
利用 MPI_Allgatherv 实现跨进程信号交换，本地队列处理 Update/Signal 事件。
支持 single/multi handler 模式，动态伸缩 FIFO 队列以优化内存与顺序。
在 Cirrus 集群上对 small/medium/large/massive 四种规模进行弱/强扩展测试。

渗流模型 CUDA 并行优化 · 爱丁堡大学

2025.02

项目描述：在 GPU 上并行化二维渗流模型，将串行 CPU 版本迁移至 CUDA，实现高效计算并分析性能。

基于 16×16 线程块映射二维网格，优化全局内存访问，实现模型并行更新。
使用 atomic 操作结合 host-device 同步，准确检测迭代终止，并引入 pinned memory 加速数据传输。
对 512²–16384² 不同规模网格进行基准测试，NVIDIA V100 上最高达 52.8× 加速。

细胞自动机问题的二维分解 · 爱丁堡大学

2024.11 – 2024.12

项目描述：使用 MPI 对细胞自动机 C 语言串行程序进行二维分解以实现并行加速。

为细胞自动机问题设计二维分解策略并使用 MPI 拓扑实现。
手动管理数据的分发与收集。
使用非阻塞通信实现进程间网格边界的数据交换。
根据问题规模与进程数量对程序的可扩展性进行实验分析。

文献检索分析系统 · 爱丁堡大学

2024.10 – 2025.04

项目描述：基于 Python/Flask + SQLite 构建一站式学术文献检索与分析平台。

设计并实现 /search 与 /analysis 后端接口，支持按关键词/作者/标题检索及元数据统计。
调用 arXiv API 并将结果持久化到 SQLite 数据库，兼顾实时性与离线可用性，减少 API 调用延迟。
使用 NLTK VADER 对论文摘要进行整体情感打分，并在搜索结果中展示情感标签，提高信息筛选效率。
采用 TF-IDF 提取摘要特征，基于 KMeans 聚类并通过 PCA 降维，在前端以散点图形式呈现论文主题分布。
使用 HTML + Tailwind CSS + jQuery 与 ECharts 开发交互式界面，实现检索列表、聚类图、情感雷达及元数据统计图的动态渲染。
采用 Docker 容器化与 GitLab CI/CD 自动化部署，部署效率提升 80%，自动化测试覆盖率达 85%。

基于 LLM 的复习助理 · 萨塞克斯大学

2023.10 – 2024.06

项目描述：毕业设计，通过 LLM 开发自动测试生成网页应用，根据用户上传的书籍、笔记、上课音视频生成测验帮助巩固知识。

处理各种格式文件输入 LLM，包括使用 Whisper 模型转录音视频文件。
研究 Map Reduce 与 Clustering 等处理 LLM 长文本输入的方法生成测验。
通过 RAG 检索文本中相关信息，根据用户错误答案提供反馈。
处理 LLM 输出，通过 StreamLit 构建网页应用。

面部对齐与关键点检测系统 · 萨塞克斯大学

2023.02 – 2023.05

项目描述：设计、构建、测试和评估一个用于执行面部对齐的系统，即在图像中定位面部 44 个关键点。

构建级联模型，在第一阶段预测 5 个关键点坐标后修正人脸角度，在第二阶段预测所有 44 个关键点。
对数据集进行大量数据增强。
实验多种参数：模型选择、数据增强、MSE Loss 与 Wing Loss、学习率、Batch Size 等。
对实验结果和失败案例进行定量和定性分析，最终 NME 降至 5%。

联系方式

Xu Hao

Algorithm Engineer | High Performance Computing Specialist

Detailed PDF Resume

Basic Information

Gender: Male
Age: 24
Contact: 18913-028-323
Email: haoxu.live@outlook.com

Education

University of Edinburgh (QS #27), Edinburgh, UK

MSc in High Performance Computing with Data Science
Sep 2024 – Aug 2025

Core courses: Thread Programming (OpenMP), Advanced Message-Passing Programming (MPI), Accelerated Systems: Principles and Practice (CUDA), HPC Architecture, Performance Programming, Large-Scale Machine Learning, Fundamentals of Data Management, Practical Software Development.

University of Sussex, Brighton, UK

BSc in Computer Science & Artificial Intelligence (2+2)
Sep 2022 – Jun 2024

Soochow University (Project 211), Suzhou, China

BSc in Artificial Intelligence (2+2)
Sep 2019 – Jun 2021

Key courses: Databases, Fundamentals of Machine Learning, Neural Networks, Computer Vision, Natural Language Engineering, Program Analysis.

Personal Profile

Bachelor’s from a Project 211 institution and a QS #27 master’s in HPC & Data Science.
Solid AI background during undergrad (ML, CV, NLE).
Master’s focus on computer architecture, parallel computing, and GPU acceleration.
Teaching assistant and internship experience; strong problem-solving and communication skills.
Programming: Python, C/C++.
Technologies: Linux, LaTeX, Git, MySQL, MongoDB, OpenMP, MPI, CUDA, SYCL, Hadoop, Spark, PyTorch.
Several years in the UK; professional proficiency in English.

Work Experience

Teaching Assistant – Natural Language Engineering | University of Sussex

Oct 2023 – Dec 2023, Brighton, UK

Lectured second-year undergraduates on core principles of Natural Language Engineering.
Led lab sessions and guided students through coding assignments.
Assisted students in overcoming challenges.

Junior Research Associate – ML for Cell Classification | University of Sussex

Jul 2023 – Sep 2023, Brighton, UK

Segmented cell micrographs using CellPose.
Used VAE to reconstruct missing channels in four-channel cell images.
Classified cell cycle stages with CNN models.

AI R&D Intern | Focus Technology Co., Ltd

Sep 2021 – Jun 2022, Nanjing, China

Developed CNN models for detecting prohibited item images.
Applied YOLO for sensitive icon detection.
Implemented image matting techniques for edge refinement.
Trained CLIP models for Ali Tianchi’s multi-modal product retrieval challenge.

Data Analysis Intern | Nanjing Maiyuchuang IT Co., Ltd

Jun 2021 – Aug 2021, Nanjing, China

Built a sentiment analysis pipeline for public opinion monitoring.
Crawled and preprocessed news data using Scrapy.
Fine-tuned BERT for sentiment and text classification tasks.

Project Experience

ViT Training Acceleration | University of Edinburgh

Mar 2025 – Apr 2025

Profiled performance on the Cirrus HPC platform with DLProf and Nsight Systems.
Optimized code: enabled AMP, cudnn.benchmark, and moved loss accumulation to GPU.
Compared DP, DDP, and manual pipeline parallelism for scalability.
Increased throughput from ~6 to ~36 samples/s on 4× GPUs with no accuracy loss.

Parallel Optimization of Brain Simulation | University of Edinburgh

Mar 2025

Designed an event-driven coordination framework in C/MPI for distributed simulation.
Implemented MPI_Allgatherv for inter-process signal exchange.
Conducted weak and strong scaling tests on small to massive cluster sizes.

CUDA Parallelization of 2D Percolation Model | University of Edinburgh

Feb 2025

Mapped a 2D grid to 16×16 CUDA thread-blocks; optimized global memory access.
Used atomic operations and pinned memory for efficient host-device synchronization.
Achieved up to 52.8× speedup on NVIDIA V100 across various grid sizes.

2D Decomposition of Cellular Automaton | University of Edinburgh

Nov 2024 – Dec 2024

Developed an MPI topology-based 2D decomposition for parallel CA simulation.
Managed data distribution and non-blocking inter-process communications.
Evaluated scalability over different problem sizes and process counts.

Literature Retrieval & Analysis System | University of Edinburgh

Oct 2024 – Apr 2025

Built a Flask/SQLite platform for academic paper search and analysis.
Integrated arXiv API with caching to reduce latency.
Applied NLTK VADER for abstract sentiment scoring.
Used TF-IDF + KMeans + PCA for topic visualization in interactive front-end charts.
Developed UI with Tailwind, jQuery, and ECharts; containerized with Docker and CI/CD.

LLM-Based Study Assistant | University of Sussex

Oct 2023 – Jun 2024

Processed diverse file formats and transcribed media via Whisper.
Explored Map-Reduce and clustering for long-text quiz generation.
Deployed a Streamlit web app with RAG and dynamic feedback.

Face Alignment & Landmark Detection | University of Sussex

Feb 2023 – May 2023

Designed a cascaded model for 5-point angle correction and 44-point landmark detection.
Performed extensive data augmentation and hyperparameter tuning.
Reduced NME to 5% through quantitative and qualitative analysis.

徐浩