Avatar

Da-Wei Lee

Machine Learning Researcher

jingle.ai

Biography

My name is Da-Wei Lee (David Lee). I am an ex-Data & Applied Scientist at Microsoft. Currently working at a Quant company doing trading strategy research using Machine Learning. Enjoy being a maker - think of any creative idea and try hard to make it come true. I love to play music and any other cool stuff. I have a huge enthusiasm for learning and curiosity about discovering. Also, I am very willing to help people and sharing what I have learned.

Interests

  • Artificial Intelligence
  • Natural Language Processing
  • Quant
  • Recommender System
  • Embedding System Design

Education

  • MEng in Software Engineering, 2021

    Peking University

  • BSc in Electronic and Computer Engineering, 2017

    National Taiwan University of Science and Technology

Skills

Learn everything I am interested in and master them.

Programming

Python for Machine Learning
C/C++ for Embedding System Design
Node.js for Back-end Design
Java for Android App. Dev.
C# for Unity Game Design
Verilog HDL for FPGA Design
Matlab, R for Math Calculation

Fluent using Vim

Music

Drum Kit
Guitar
Piano
Wind Band Percussion
Home Studio

Other Hobbies

Photography
Coffee (Latte Art & Pour-Over Coffee)
Skateboarding
Rubik’s Cube
3D Printing
Rollerskating
Bicycling
Skiing
Motorcycling
Drone

Publications

Open Relation Extraction via Query-Based Span Prediction

QORE utilizes a Transformers-based language model to derive a representation of the interaction between arguments and context, and can …

Open Relation Extraction with Non-Existent and Multi-Span Relationships

We proposed a Query-based Multi-head Open Relation Extractor (QuORE) to extract single/multi-span relations and detect non-existent …

Towards Topic-Aware Slide Generation For Academic Papers With Unsupervised Mutual Learning

Generating slides from papers by extractive summarization techniques and unsupervised mutual learning to deal with data lacking issue.

Life

You only live once, so YOLO!

.js-id-Music

Microsoft Suzhou Band - Return True

We formed at FY22 Kickoff (Aug 2021), and continue till today. I act as a drummer, acoustic guitar player, keyboardist, vocal, recordist, mixer, … in this band.

Experience

Job / Intern

 
 
 
 
 

Machine Learning Researcher

jingle.ai

May 2023 – Present Shanghai, China
T0 Quant Machine Learning Strategy Research
 
 
 
 
 

Data & Applied Scientist

Microsoft Software Technology Center Asia - WebXT Bing Multimedia

Jul 2021 – Mar 2023 Suzhou, China
Recommender System for Video Recommendation.
 
 
 
 
 

Algorithm Intern

Microsoft Software Technology Center Asia - WebXT Bing NLP Carina

Jul 2020 – Jun 2021 Beijing, China

Worked on Writing Assistant related projects in two main parts:

  • AI Writer: An application which aims to increase diversity of an article as well as reduces efforts of writing “filling text” for human. (This was the project used for the intern conversion)

    • Continuous Writing:

      • GPT2
    • Rewriting:

      • Paraphrasing

      • Back-translation

        • Bing Translator
        • Google Translation
      • Information-Retrieval-based

        • Elastic Search
        • Approximate Nearest Neighbor (annoy)
      • Style Transfer

        • Style Transformer
  • Value Understanding: Built a numerical extractor which can extract quantity fact from raw text.

    • Designed an annotation guideline especially for Chinese quantity extraction.
    • Communication with labeling company and annotate more than 2000 article data to construct the training dataset from scratch.
    • Designed two major approaches namely “NER Combine” and “Quantity MRC”.

      • NER Combine: Combine spans with label extracted from NER model with an scope-based rule-based algorithm
      • Quantity MRC: Construct query for each slots based on extracted Quantity
    • Post-processing modules that able to deal with complex sentences especially the “respectively cases”.

    • Got used as back-end of three different projects

      • Writing Assistant (mainly finance): Including value consistency and value recommendation
      • Medical thesis analyser
      • A WeChat mini program


Interviewed 5 internship candidates (after getting the return offer).

 
 
 
 
 

Research Intern

Microsoft Research Asia - Knowledge Computing

Dec 2019 – May 2020 Beijing, China
Take over mainly two research-oriented NLP projects.

  • Generation of slides from academic paper
  • Math word problem generation
 
 
 
 
 

Research Intern (Laboratory)

Peking University National Engineering Research Center of Software Engineering

Jul 2019 – Jun 2021 Beijing, China

Doing case of Anti-healthcare fraud and Medical record analysis.

Including research of:

  • Information Extraction
  • Named-entity Recognition
  • Relation Classification
  • Knowledge Graph

PKU Thesis: Design and Implementation of Chinese Document Numerical Fact Extraction

 
 
 
 
 

Embedding System Design Software Intern

Industrial Technology Research Institute (ITRI)

Jul 2016 – Aug 2016 Hsinchu, Taiwan
I was in the self-driving group, I mainly handled the STV0991 development board which was going to carry the computer vision algorithms.

Piecework

Freelance / Personal Case

 
 
 
 
 

EEG Analysis

NTUST Department of Business Administration Professor

Sep 2018 – May 2019 Remote
I used Matlab to process and analysis EEG raw data. And do some visualization and animation on it.
 
 
 
 
 

Leapsy AR Glasses Video Stream Pan/Tilt Head

All Joint

Jul 2017 – Oct 2017 Taipei, Taiwan
I collected sensor data on Android-based AR Glasses to capture current attitude and sent it back to Raspberry Pi to synchronize camera pan/tilt head’s direction then return video stream back to glasses through Wi-Fi. And I made pan/tilt head structure using 3D print model to contain camera and two servo motors, and designed the power supply circuit for both motors and Raspberry Pi.
 
 
 
 
 

ECG Analysis

NTU on-the-job Ph.D. Student

Nov 2015 – Feb 2016 Remote
I used Matlab to do fourier transformation on ECG (Electrocardiogram) signal by filtering out the high frequency noise and finally predicting its trend.
 
 
 
 
 

Oil Monitor System

Belton

Jul 2014 – Oct 2014 Remote
I and my collage roommate Tom built a Windows application to get the machine’s sensor values, show them and store them in a database. This project was asked to use Visual Basic.

Competition

 
 
 
 
 

BeChangeMaker

World Skills

Mar 2023 – Sep 2023 Online

Ecojoy

We want to solve the problem of “Toy waste”. Excessive pollution not only affects the physical environment of future generations but also cultivates children who do not cherish resources, which has a major impact on the world. We hope that through a very simple way, every old toy will no longer be piled up at home or enter the landfill, but can also become a resource for others. We have software engineering, social education, and economics background. Observing that the problem of toy waste is becoming more and more serious, it is readily available and cheap, becoming a quick solution for most parents to deal with their children. We believe that as long as the sharing and acquisition methods are simple enough, it can immediately improve the situation of excessive waste. Through subscription to become members of Ecojoy App, you can easily share excess toys at home, and through the perfect toy information and rating system on APP, users can easily find suitable toys to meet their needs and achieve toy sharing and reuse.

Facebook Page

 
 
 
 
 

Jigsaw Unintended Bias in Toxicity Classification

Kaggle

Feb 2019 – May 2019 Online
This competition is aim to classify whether a comments is toxic. Our team design different models such as BERT, ELMo etc. as classifier and finally ensemble them. Our team reach Top 1% in rank.
 
 
 
 
 

Failure Prediction of Concrete Piston for Concrete Pump Vehicles

Digital China Innovation Contest 2019

Jan 2019 – Mar 2019 Online

In this competition, each sample is a time-series data of a concrete pump vehicle. The goal is to predict the likelihood of each data sequence that whether a machine might fail. I used LightGBM and reach Top 5% in rank.

Source Code

 
 
 
 
 

ARM Design Contest

ARM

Apr 2016 – Nov 2016 Hsinchu, Taiwan
Based on my independent study of department project - the quadcopter project. Using specified development board STM32F4 to drive the quadcopter. We get Top 10 in the final.
 
 
 
 
 

HOLTEK MCU Design Contest

HOLTEK

Apr 2016 – Nov 2016 Taichung, Taiwan
Based on my independent study of department project - the quadcopter project. Using specified development board STM32F4 to drive the quadcopter. Finally, we get honorable award.
 
 
 
 
 

NTU System App Contest

National Taiwan University System

May 2015 – Aug 2015 Taipei, Taiwan
Designed a platform called Skill Exchange - maa talent and skill exchange platform which matches people with their know-how and what they want to learn. Finally, we get honorable award.
 
 
 
 
 

NSYSU LED Design Contest

NSYSU EE

Oct 2014 – May 2015 Kaohsiung, Taiwan
An installation art LED grid ball that combined sound and light. This project collaborated with design department students. Using gaming button to trigger MIDI signal to a computer to make a sound. And control LED grid with Arduino. Finally we get merit award.
 
 
 
 
 

NTU Taiwan 2048 BOT Contest

NTU

May 2014 – Jul 2014 Taipei, Taiwan
I and my friend Tom built an AI BOT for the 2048 game. We used Monte Carlo Tree Search (MCTS) with alpha-beta pruning to select best action. And score each state(board) with our own designed evaluation function. Finally we get honorable award.

Accomplish­ments

Certifications

Intermediate Barista

Jiangsu Vocational Skills Certificate, No. S000032050806234001680
See certificate

TOEIC 785990

Test of English for International Communication: Advanced
See certificate

Technician Certificate: Computer Maintenance class B

See certificate

Technician Certificate: Computer Maintenance class C

See certificate

Projects

Side Projects / Courseworks / Source Code

*

Stanford CS224n NLP with DL

Self-learning of the course. Including projects of word2vec, dependency parsing, machine translation, question answering.

SemEval-2013 Word Sense Induction

SemEval-2013 Task 13 Word Sense Induction for Graded and Non-Graded Senses.

SemEval-2018 Relation Classification

SemEval-2018 Task 7 Semantic Relation Extraction and Classification in Scientific Papers.

Operating System

PKU OS course project and notes based on Nachos and XV6

2048 AI BOT

An AI BOT for 2048 game. Built MCTS version in 2014. Rebuilt RL version in 2018.

Raspberry Pi Cluster GitHub stars

An efficient quick-start tool to build a Raspberry Pi Cluster with popular ecosystem like Hadoop, Spark.

Deep Learning Practice GitHub stars

Neural Network Implementation. Course project including NLP, RL, CV topics.

Machine Learning Practice GitHub stars

Implement machine learning algo. from scratch. Including course projects and notes which are related to statistics machine learning.

Modularized Quadcopter Architecture with Computer Vision Control

My independent study of department. Built a quadcopter from scratch, running on different platform and combined with CV. Earn school …

Leadership and Extracurricular Activities

.js-id-Leadership

Student Association of ECE Department

Serve as atristic designer. Handle poster design and Facebook fans page operation.

Junior High School Alumni Wind Band

Being principal percussionist in junior high school alumni wind band during 2016, 2017, 2018, 2019 summer.

Hsinchu Alumni Association

Serve as photographer and social media manager. Serving in hometown for elementary school students in 2014.

Cycling around Taiwan

Cycling counter-clockwise around Taiwan with senior high school classmate in ten days.