I am an experienced and skilled software developer. Also have great passion on data. My greatest dream is to become a data scientist and use data to make the world a better place.
My name is Xu Li. I am a software developer currently based in Austin, TX. I am a 100% technical person, with a background in both Computer Science and Data Science. My ultimate career dream is to become a very professional data scientist and use my insights in analyzing data to make a difference to the world.
I have already experienced the charming aspects of programming when I stepped into Beihang University as a computer science student. However, it was at my junior year that I fell in love with Data Science. The instructor of the data mining course demonstrated to me about the powerful magic of analyzing data. Fascinated and amazed by the process and result, I decided to learn more knowledge, skills and techniques about data science. Thus, my journey to become a data scientist have started.
This small but important decision has influenced my future career path. I came to UT Austin for further studying about data science and data analysis after finished my undergraduate program. The course projects I involved in UT have promoted my understanding of data to a new level. Not only about analyzing them efficiently, but also about demonstrating them effectively. Also, the 5 internships I have finished in the past, 2 in Austin and 3 in Beijing, were all more or less related to data science, machine learning, big data or databases. They gave me lots of practical experiences in specific data scenarios and helped me learn many new skills and tools when handling data.
My career path to become a data scientist has just started. It is undoubtful that there will be more “beautiful sceneries” along the way. The process of analyzing data is also the process of learning about this world. It will be really fascinating to see the day that I am able to use my knowledge about the data to have impacts on this world and make it a better place!
Communicated with product team and analyzed related business concepts and data product requirements. Communicated with engineering team and figured out backend architecture of data storage and data integration. Built prediction models about sales opportunity based on clients’ customer data of marketing behaviors. Developed a score evaluation system, provided probability and score of every in-progress sales opportunity target. The above works were mainly done by using Python.
Designed the architecture and main components of KSQL (stands for Kafka SQL) Server. Implemented the KSQL Server in two modes respectively – interactive mode & headless mode. Interactive mode of KSQL Server is suitable for short-term usage and supports access by using REST APIs. Headless mode of KSQL Server is suitable for long-term usage and supports access by deploying SQL query file. Developed the User Define Function module for both modes of KSQL Server by using Java. Added Prometheus and Datadog module to visualize the metrics related to KSQL Server and monitor its status. Improved the log outputs of KSQL Server and enabled checking the log of KSQL Server by using Splunk. Built Docker Containers for both modes of KSQL Server and deployed them successfully to the cloud server (AWS). Created Maven Archetype of the headless mode KSQL Server and significantly improved the deploying efficiency. Used Jira to record and track all the key development process and bug fixing process of KSQL Server. This KSQL project significantly shortened the development cycle of real-time data streaming application.
Designed & Built Kafka Clusters to collect real-time user data and store into MySQL and MongoDB, using Python. Redesigned the structure of database and finished data migration with no loss or duplicates. Maintained & Integrated Spark Clusters with Kafka and databases, using Java. Improved the functionality of the product - Wechat Mini Program, using Python, Java and JavaScript.
Implemented a real-time Network traffic data filtering distributed system, using Java, Storm, Kafka, Zookeeper. Built and trained machine learning models to find out malicious JavaScript code extracted from HTML scripts. The system applied on CERNET2, the next generation national network in China with completely adoption of IPv6.
Discussed with business department to confirm the current requirements in data collection, data analysis & data viz. Established data collection daily tasks, weekly tasks and monthly tasks from Hive data warehouse, using SQL Scripts. Built Regression Models to analyze the correlations among several important metrics, using R and Python. Designed & Built up daily, weekly and monthly data viz reports to reflect business situations, using Tableau.
VertifyData is a data integration company that help their customer companies integrate marketing data and sales data together. Usually, sales opportunities are contacted and assessed case by case manually, which is an uncertain and inefficient process. This capstone project aims to improve this situation. It uses the activity data from those customer companies, build machine learning models and make predictions about the status of sales opportunities. Provide to those customers with strategic and valuable information automatically to facilitate better marketing analysis and sales achievements.
Used data published by Austin goverment to make a dashboard to demonstrate some general summarization of crime events in Austin. Help students, teachers and staffs get a better understanding of the safety level around their neighborhoods. Using Tableau.
The topic and data of this project was acquired from 2010 KDD-Cup Competition. Built up & trained the predicting model, based on Unsupervised Learning and Clustering algorithms, using Python. Improved the performance of predicting model by optimizing feature selection, related algorithms and key parameters. Final model accurately predicted students’ performance tested by the testing dataset with RMSE equal to 0.41.
Analyzed the requirements and designed the general architecture and components of the website. Developed and implemented the website step by step, using HTML, JavaScript, PHP, Java, MySQL. Tested each part of the website and wrote development documentation.
Designed a simplified C-Complier consisted of 4 parts – Scanner, Parser, Semantic Analyzer & Assembly Generator. Implemented and tested the Compiler based on the above design, using C++ programming language. Successfully output the MIPS assembly codes given C scripts, able to run on the MARS platform.
GPA: 3.89
GPA: 3.60
Apart from being a developer, I am also a fan of Japanese anime. I believe that anime is also a very great type of media form, just like TV drama and movie. An excellent anime can bring the audiences very strong emotional feelings, or convey the audiences somewhat abstract but very important ideas vividly. Also, due to the way of experssion, anime is able to demonstrate nearly any topics perfectly, even for the topics that may seem hard for Hollywood Industry. Because you can just create anything by painting and drawing. Personally, here is my favorite anime list: Your Lie in April, Steins;Gate, Fate/Zero, K-ON, IDOLM@STER, LOVELIVE! etc. Actually the list is very long, I can only show you this tiny part.
Besides Anime, I am also very fond of reading manga and light novels, or playing video games when I have spare time. Unlike traditional idea that these items are merely entertainments or waste of time, I think they are absolutely very great types of media forms as well. Their very innovative ways of narrative can often give me a big surprise, refresh my mind and broaden my horizon. If we have any the above items in common, please feel free to contact me, we could definitely have a good talk!