“Autonomous Driving in Baidu”
“Big Data in Baidu”
*details limited by NDA
“Scientific Software Network Map”
This “Scientific Software Network Map” project is part of an NSF-funded project (SciSIP: The Scientific Software Network Map) to map out the dependencies among scientific software packages. The goal is to help scientists monitor how the software they write is being used, so that they can focus updates and improvements on the most needed areas, understand more deeply how they are being used together, and to take credit for their contributions by being able to document how much their software is being used.
- Collaborated with teammates to design and develop the web app, the scientific software network map described as above.
- Collaboratively build the data pipeline, extract, clean, store and explore the dataset with python, mySQL, based on data from Texas Super-Computing Center.
- Collaboratively developed the web-application to hold interactive diagrams, charts and force-directed diagram with Pyramid, python, D3.js, Jinja2, HTML, CSS, Bootstrap.
- In the agile development, acted as Scrum Master, Planning Manager and Risk Manager. Planed and tracked the project on both strategic level and tactical level. Established risk management process and tracked and mitigated risks.
- Independently designed Usability Test Template based on User Behavior Model.
“Map Reduce Facility”
Implemented a Map Reduce facility from scratch.
- Collaborated with one teammate to design and implement a MapReduce Facility and a Distributed File System with Java from scratch, a MapReduce engine which is similar to Hadoop but aimed at smaller dataset.
- Independently design and developed the Distributed File System, similar to HDFS and AFS that is used as base for MapReduce Engine by central coordinator solution.
- Collaborated with teammates to develop the concurrency control, failure recovery, system monitor and conduct performance optimization.
- Collaborated with teammates to test the system and to document the system architecture with with architecture drivers, key architecture decisions and static, dynamic and physical architectural views.
My Java RMI Facility
Implemented Java Remote Method Invocation (RMI) facility from scratch.
- Implemented RMI-Registry, a server program that manages the remote objects provided by the server, supporting listing, looking up, binding, rebinding and unbinding remote objects.
- Utilized Java Dynamic Proxies to generate client-side stubs at runtime.
- The RMI facility supports both pass by value and pass by reference.
“Yelp-Insighter” Big Data Pipeline and Interactive Visualization
Designed and developed a data pipeline and interactive visualization web application.Collaborated with teammates to develop a data pipeline and interactive visualization web application, which provides market place insights to a restaurant owner or investor with interactive charts and diagrams to each restaurant category, words cloud for top restaurants and interactive geographical distribution map.
- Collaboratively developed the data pipeline to extract, clean, store and explore the dataset with python, Google BigQuery.
- Independently developed interactive heat map and geo-distribution of restaurants with Google Fusion Table and Google Map.
- Collaboratively developed the web application that holds interactive diagrams, charts and maps with D3.js, Jinja2, WebApp2, Google App Engine, HTML, CSS, Bootstrap.
- Independently designed and produced a promotion video that vividly introduces the project.
Parallel Data Processing : DNA Strands & Data Clustering with MPI
Designed and implemented a parallel data clustering algorithm on distributed machines.
- Collaborated with one teammate to develop a parallel algorithm for distributed processing using OpenMIP for parallelized communication, which efficiently solved clustering problem on DNA Strands, representing an important class of problem in various domains including data mining and statistical data analysis.
- Independently designed and conducted the scalability analysis for distributed parallel processing on various degrees of parallelism and dataset sizes.
Prediction System for Animal Outcome (Machine Learning)
Designed and implemented a prediction system for to predict the outcome of animals in a animal shelter with Machine Learning algorithm.
- Independently designed and developed a Animal Outcome Predict System, which uses data from animal shelter to predict the adoption outcome of an animal based on its “feature set”.
- Independently developed statistical test to check for significant differences in things that might predict adoption with Python, with SciPy.
- Independently developed feature set to be used for prediction, trained a classifier using Naive Bayes and Decision Tree, tested classifier by 10-fold cross validation, with SciPy, SciKitLearn and numPy