I am dreaming of becoming a software who is able to craft meaningful values
and make a profit from them.
By working in the software/data domain and gaining many valuable experience, I want to be professional in this
domain.
I consistently endeavour to become a decent software engineer by allocating my time for various software
engineering domains range from fullstack programming, SQL to data structure& algorithms.
I also invest my time for data engineering domains such as ETL and data warehousing.
Because IT technology is fast-growing and agile, I believe it is important for software/data engineers to keep
track of current and future IT trends
and be prepared for new technologies and widen one's tech stack. Therefore, I am constantly studying and do
personal projects on new trendy technology such as blockchain&Web3.0.
Furthermore, I believe knowing another current trend, data anayltics/machine learning, is also beneficial as a
software/data engineer(although software/data engineering does not typically contain data analytics).
Therefore, I have experience in deep learning; I did some deep learning projects during my university study.
As a junior software/data engineer, I might lack of skills and experience compared to senior experts, but I am
a fast
learner, fast-growing, and passionate person who wants to be professional in this domain.
In my leisure time, I consistently make an effort to become an expert in the IT domain.
I graduated from Computer Science Meng in 2021 with First Class Honours at University College London and
currently looking forward to working as a data engineer.
• Supervised ACS(Auto Configuration Server) service consisting of Node.js(Koa), MongoDB, and Redis in on-prem and cloud.
• Delievered on-prem to AWS cloud migration of ACS service using Docker, AWS ECS, Cognito(OAuth2.0), and DocumentDB.
• Created a multi-stage dockerfile to eliminate intneral app dependencies and used npm registry to address private git repos.
• Built a pipeline for centralised logging&monitoring system with Kafka, Fluentbit, Vector, and sent to AWS OpenSearch.
• Deployed ACS services to AWS with Terraform and automated CI/CD pipeline with GitLab pipeline for every code changes.
• Managed mobile KI service that automates the registration of SIM cards; consists of Linux Bash Scripts & Python in on-prem.
• Delievered a full-stack development of a portal for internal clients that interact with ISP services APIs. Used React and Node.
During the 9 months, I worked as a data engineer and mainly took charge of creating and maintaining
ELT/data pipeline for product data of the company. These data pipelines are used for BI teams to create
dashboards and Data Science Team to do data analysis. The main langauge used for the data pipeline was
Scala with Apache Spark and its distributed computing system was managed by AWS EMR.
There were various data source and each data source was orchestrated with Apachee Airflow.
One of the main data pipelines I created was for promotion data. The promotion data came in as a raw JSON
file and I converted it to Parquet file format with more appropriate data schema which is more appropriate
for Data Science team to do data analysis. As a result, the promotion data accuracy increased by 5%.
These raw and/or transformed data are stored in AWS S3 as a data lake. These data can be queried with AWS
Athena and their schema could be seen with AWS Glue.
AWS Redshift was used as data warehouse and data source for the data warehouse was managed with AWS S3,
and I contributed to creating data pipelines and data migrations for data warehousing.
I used AWS EMR studio to do data validation for the data pipelines I created and for data migration by
checking for duplicates, empty/nulls, data size count, checking column names, etc.
I also contributed to SQLRunner(a program that runs config files consisting of SQL commands) that is run
in a Linux server. I added/edited config files and manage them with Linux commands.
Lastly, I had some experience in using kafka for streaming data in the data engineering team.
During the first 3 months of my internship duration, the company did not yet have a separate department
for data science works, so I worked in a software development team.
Fortunately, I worked as a data engineer in the team and mainly dealt with the data mining process rather
than their web application development.
As the company requires to gather data from its partners they had contracted with, the data mining process
included automating the data crawling, cleaning, and mapping processes.
Balaan wanted to gather their product information and categorise the products according to their colour,
origin, clothes category, and brand the company had defined.
Therefore, the data cleaning and mapping processes included mapping relatively unstructured data to
structured data that the company defined.
Furthermore, I developed mapping tables(4 tables: colour, origin, clothes category, and brand mapping
tables) that enables the Sales Team to modify the features that the company defined.
The mapping process described above was achieved by the mapping tables defined in the database and I
developed a system that allows the Sales Team to modify it without any database system knowledge(CRUD
development).
A month before I left the company, a Data Team was created and I had a chance to work in this department.
The goal of the team was to analyse data and extract meaningful data to gain an insight into the customers
and make a benefit for the company.
As a newly born department, the first mission was to construct a system that aggregates business data of
the company and creates its visualisations for the company decision makings.
We used the AWS cloud services to achieve this task as AWS cloud provides useful functions such as data
storage and visualisation with flexible scalability.
The system created consisted of 3 AWS cloud services S3, Athena, and Quicksight. The S3 was used as a
gathered data storage and the Athena and Quicksight were used for visualisations.
I developed a data pipeline for the Google Analytics (GA) data and automated the GA data retrieval process
and its data transferring process.
The metrics and visualisations created by this system were used for evaluating the performance and profits
of the company and for its decision making. They were also displayed in the company for employees to check
their weekly goals and their progress.
After I graduated from A-levels and before I got into my university, I ran one-to-one teaching sessions 12 hours a week to three prospective A-levels students. I arranged the time slots for each student, prepared all the teaching materials, and handled admin tasks such as renting a classroom. Because each student had a different level of knowledge in Physics, I adopted different teaching approaches to each student according to their strengths and weaknesses.
As a personal project, I used React to create an web application that introduces UK
secondary school edcuation(A-levels) to Korean students and allow students to find one-to-one online
lessons with tutors on A-level subjects.
Front-end:
- used Bootstraps to implement front-end web designs
- used Redux and external APIs to implement login®ister functions including Google Login
- used external APIs to implement interactive and fancy designs such as carousel
Back-end:
- used Node.js with Express to connect the React SPA to back-end and connect to a database(MongoDB)
- implemented CRUD for user data and store them in MongoDB
As part of the 'Information Retrieval and Data Mining' coursework, I implemented information retrieval
models that return a ranked list of documents relevant to a query.
There were two different coursework throughout the module:
1. Implement an inverted index and use tf-idf vector weighting to queries and documents so as to
implemented BM25 and a query likelihood language model. In addition to the query likelihood language
model, I also implemented Dirichlet smoothing, Laplace smoothing, and Lindstone correction.
2. With pre-trained dense word embeddings, GloVe, implement ranking models with Logistic Regression,
LambdaMart, and Ranknet.
I used Python pandas, NumPy, nltk, and Tensorflow/Keras on Jupyter Notebook.
As part of the 'Statistical Natural Language Processing Comparative Analysis' coursework, we(a group of 4 including me) carried out a Comparative Analysis of Deep learning-based news topic classification models. We conducted a comparative analysis of five deep learning-based techniques: RNN, GRU, LSTM, BiGRU, and Finetuned BERT for the task of news topic classification. With transfer learning with BERT, BERT has achieved state of the art performance in many tasks in NLP field with less data and computation time than traditional models. Therefore, the primary aim of this study was to explore the state-of-the-art pretrained word embedding, BERT, on news classification.
As part of the 'Introduction to Deep Learning' coursework, I had a chance to implement various unsupervised and supervised deep learning models with the Fashion MNIST dataset using Python(TensorFlow/Keras). There were three models I implemented: 1. Logistic Regression without Deep Learning libraries 2. Convolutional Denoising Autoencoder 3. Multi-label classifier with a pre-trained convolutional autoencoder. For the first task, I implemented a Logistic Regression with MLP and its gradient descent algorithm without any deep learning algorithms. For the second task, I implemented a convolutional denoising autoencoder that is able to restore noisy images to original images. Lastly, for the third task, I firstly implemented a convolutional autoencoder and used this pre-trained encoder of this autoencoder for a multi-label classifier to improve the overall performance.
For my final year project, I conducted deep learning research on Audio-to-Motion human motion generation(human synthesis). Traditional human synthesis heavily relied on visual data that requires high-end hardware, and it is labour-intensive. For example, In one of the most popular movies in 2020 'Avengers: End Game', Computer-Generated Imagery(CGI) technique was used to create realistic and manlike avatars, and actors had to wear specialised hardware on their bodies. Using the fact that human motions and speech are correlated, human motion can be predicted from human speech audio. The dataset was videos of weekly presidential addresses of President Barack Obama. The audio of these videos were converted into a trainable dataset with MFCC, and the facial motion was extracted using a facial behaviour analysis toolkit, Openface. Openface can give facial landmark coordinates from video footages. This research used the representation learning technique(denoising autoencoder) to reduce the dimensionality of the motion data and extract key features. Firstly, an autoencoder was trained for the facial motion-to-motion(facial landmark coordinates), and a latent representation was learned. Then the mapping between MFCC features of the audios and this motion latent representation was learned. Lastly, using the decoder of the motion-to-motion autoencoder, the facial motion can be deduced from the audio data. The mapping between audio data and the latent representation was learned with Gated Recurrent Unit(GRU) and Fully Connected layers.
I collaborated with Electrical Engineers to build a smart classroom prototype that autonomously controls a classroom environment(temperature, brightness, and humidity) and the room's security. We made an IoT system with a CC3200 launchpad and created a data pipeline from the launchpad to the IBM cloud with wifi. Using IBM Rednode, we developed a dashboard UI that monitors the room's conditions. As the only computer scientist in the team, I took charge of the data analysis section of the project. I used Python pandas and NumPy to do the data analysis. I went through Exploratory Data Analysis(EDA) process to investigate and analyse the data more deeply. The EDA process included evaluating statistical metrics/graphs and data visualisation to find1 trends and features of the data.
This research investigated the trend of one of the largest sharing economy platforms, Airbnb. Initially, the Airbnb manifesto stated that Airbnb fosters social interactions between hosts and guests. However, there is an ongoing debate saying that Airbnb is becoming a purely business transaction platform. Therefore, this research aimed to find out the answer to this question. We carried out this research by using reviews of English-speaking customers worldwide as a dataset. Using a pre-defined dictionary that tells which words belong to 'business' domain or 'social interaction' domain, these reviews could denote whether users treat Airbnb as a business platform or as a social interaction platform. We did the data analysis mainly with Python pandas and NumPy and some NLP libraries(e.g. nltk) to figure out the proportion of people using Airbnb as a business platform and social interaction platform. We compared the results with data visualisations. The data visualisations showed that the social interaction factors of Airbnb is decreasing, and Airbnb is turning into a business-oriented platform over the past ten years.
I have experience in developing a web platform with two team members for a client from Authur Murray dance studio in the US. Authur Murray dance studio is a company that teaches dance to customers throughout the US and the UK. This project aimed to develop a web application that allows Arthur Murray dancers to upload their dance performances for a remote assessment. For a more accurate assessment, we used a Microsoft Kinect camera sensor that captures 3D human motions. I mainly took charge of developing the backend of the web application. The primary programming language used was Node.js(Express) with MongoDB as the database. I implemented functionalities of uploading, register, login/logout, password encryption, and Google Map integration.
As a personal project, I used React to create an web application that introduces UK secondary school edcuation(A-levels) to Korean students and allow students to find one-to-one online lessons with tutors on A-level subjects.
Source Code Website LinkAs a personal project, I developed a ERC-721 NFT on Rinkeby Testnet. Source Code
Final year project on audio-driven human motion generations. Whereas, traditional methods requires high-end hardwares and labour-intensive, this approach only needs human speech to predict human facial motions.
Collaborated with students from other departments to deliver a smart classroom prototype that autonomously controls a classroom environment(temperature, brightness, and humidity) and monitors the security of the room.
Investigated on Airbnb usage flow; see if Airbnb encourages social interactions between hosts and guests or it is(or turning into) a business-oriented platform that makes profit.
Developed a web application in which users can upload their dance performance videos for an detailed assessment anywhere in the world.
A web crawling programme with GUI. This programme checks the stock of second-hand books from Aladin second-hand bookshop and shows where you can buy those books you input.