Large Model Inference on Edge Devices (Master’s Graduation Project)
Duration: Dec. 2024 - Jul. 2025
Company: Signify
TL;DR: Enabled efficient on-device video captioning by applying quantization and partitioning to lightweight vision-language models, reducing memory usage while preserving accuracy.
Details
Large scale Vision-Language Models (VLMs) application in video captioning is inhibited due to their high computational and memory footprint. This project investigates methods for enabling efficient on-device inference by leveraging model quantization and segmentation strategies. The work initiates with an evaluation of various lightweight VLMs to shortlist potential candidates based on their availability as open-sources, replicable results, and model sizes. Both post-training quantization and quantization-aware training techniques are considered to shrink model sizes while maintaining captioning effectiveness. Additional partitioning schemes such as encoder-decoder separation and sliding window inference are proposed to partition the inference load among edge devices. Experiment results conducted with MSVD and MSR-VTT datasets verify their effectiveness with considerable savings in memory consumption while incurring minimal sacrifices in terms of effectiveness. The work concludes with considerations for pragmatic deployment together with an outline of directions for futures in hybrid edge-cloud architecture for captioning.Stack: Visual-Language Models (VLM), Model Quantization, PyTorch, Model Partitioning
Rental Notification Service (MVP)
Duration: Jul. 2023 - Aug. 2023
Role: Independent Developer
I independently built and launched a subscription-based rental listing notification service that gained 103 paying subscribers and generated €2,530 in revenue within two months. Beyond development, I also handled product design and growth marketing, taking the project from idea to a validated business.
Stack: Python, Web Scraping, HTML, Webhook bot, Growth Marketing
Knowledge-Enhanced Text Representation Toolkit for Natural Language Understanding
Duration: Jun. 2022 - Jan. 2023
Organization: Chinese Academy of Science
TL;DR: CogKTR is an open-source toolkit for knowledge-enhanced text representation in NLU, unifying acquisition, representation, injection, and application of external knowledge under the Unified Knowledge-Enhanced Paradigm (UniKEP). The toolkit is open-source on GitHub, with an online demo and an instruction video.
Details
Modern NLP starts with text representation, converting discrete texts into continuous embeddings. While pre-trained language models (PLMs) excel at this and have advanced natural language understanding (NLU), they usually rely only on textual context—insufficient for knowledge-intensive tasks. Integrating external knowledge into PLMs can produce richer, more knowledgeable representations. However, existing knowledge-enhanced methods vary greatly, making them hard to reproduce, extend, or combine.To address this, we introduce CogKTR, a knowledge-enhanced text representation toolkit based on our Unified Knowledge-Enhanced Paradigm (UniKEP). It includes four stages:
- Knowledge acquisition
- Knowledge representation
- Knowledge injection
- Knowledge application
CogKTR offers:
- Easy-to-use knowledge acquisition interfaces
- Multi-source knowledge embeddings
- Multiple knowledge-enhanced models
- Support for diverse knowledge-intensive NLU tasks
Stack: Python, PyTorch, BERT, Wikidata, WordNet, CogNet
Advertisement Strategy Optimization at Xiaomi
Duration: Mar. 2022 – May 2022
Company: Xiaomi (Internet Business Group)
TL;DR: Analyzed and optimized ad inventory, bidding, and fill strategies using performance metrics (eCPM, CTR, conversion rate), achieving +18% ad inventory, +3% revenue from bidding optimization, and +11% revenue from fill strategy optimization.
Details
Conducted data analysis on ad slot utilization within Xiaomi content feed to identify underutilized inventory, expanding available ad space by 18% and boosting advertiser fill rate and revenue. Designed and ran A/B tests to optimize bidding algorithms and filtering strategies in video feeds, increasing eCPM and advertiser revenue by 3%. Evaluated and adjusted ad prioritization rules for video detail pages, improving eCPM and advertiser revenue by 11% through refined fill strategies.
Stack: Data Analysis, A/B Testing, eCPM Optimization, SQL, Digital Marketing
Java Patch Detection to Cope with API Breaking Changes
Duration: Feb. 2024 - May 2024
Organization: TU Eindhoven
TL;DR: Built a static analysis pipeline using Maracas, Gumtree, e-knife, and SDG to detect Java breaking changes and patches, improving accuracy and efficiency in software evolution.
Details
During the process of software evolution, the dynamic relationship between dependencies and dependants often leads to breaking changes that disrupt software functionality. This report explores a pipeline designed to detect and analyze these breaking changes using a combination of static analysis, code differencing, and code slicing tools. I integrate Maracas, Gumtree, e-knife, and SDG to generate a comprehensive database of broken use slices and their corresponding patches. My methodology offers a structured approach to identifying and understanding the impact of breaking changes on client code. It contributes to the efficiency and accuracy of patch generation in Java software ecosystems.Stack: Java, Maracas, GumTree, e-knife, SDG, Maven