Bird

Tinker, Programmer, Student Always.

My Passion for Data Science

I've always been humbled by the importance of data. As information develops a prominent role as one of our modern world's most valuble resources, I aim to be at the forefront of our technologies in this field- understanding how we collect, analyze, and interpret data. From optimization to discovery, data science serves as the modern basis for technological innovation and development. We utilize machine learning algorithms for mapping genetic code, artificial intelligence for interactive hardware devices, and our massive databases to produce valuble information from what we already know. As someone who has always believed in using knowledge to empower people of all races, genders, and backgrounds, data science is more than a career to me- it's a philosophy.


Experience

Research Collaborator with The Bengson Research Laboratory at Sonoma State University – 2018
Working with a team of researchers in using machine learning on alpha, beta, and theta waves collected from EEG electrodes in subject trials to computationally predict individualized occipital lobe activation. Initial research suggests feasibility of a brain-computer interface.


Undergraduate Research Assistant at California Institute for Energy and Environment (CIEE) – 2018
Working with a team of researchers in using the Extensible Building Operating System (XBOS) for energy usage predictions. Exploring gaussian process and recurrent neural network (RNN) models.


Data Science Intern at Castlight Health – 2017
Matched externally sourced entities (hospital/non-hospital facility/practitioner) to entities within our database. De-duplication process involved machine learning in order to optimize for similarity-matches. Utilized hard negative mining to improve our small (less than 1K) training dataset. Gradient-boosted classifier with hyper parameters tuned by grid search and retrained with hard-samples produced results of 85-95% precision and recall depending on entity. Validated with k-fold cross validation. Used Receiver Operating Curve (ROC) and area under curve (AUC) as another measurement of performance.


Data Science Intern at RivieraPartners — 2016
Used a gradient-boost algorithm for determining company team sizes based only on publicly available information. Also built Python wrapper for a survival model time-series written in R, then built algorithms for pre-processing data for the model. Set up Flask API route and worked with team to deploy on production server. Also worked on automated scraping project.


Undergraduate Research Apprentice Program (URAP) at Berkeley Institute of Data Science (BIDS) — Spring 2016
Worked on Mapping Team to help map UC Berkeley course progression through different majors. Computationally organized taxonomies of classes, ran deduplication processes, and helped plan data visualization.


Data Science Intern at Doximity — 2015
Utilized a supervised machine-learning algorithm with a gradient boost classifier to identify mal-formed articles scraped from other sites using BeautifulSoup, NLTK, and data visualization tools. Triangulated doctor names in news articles to facilities and database profiles using reverse geocoding and fuzzy string matching.


Web Developer Intern at PriceWater Capital — 2014
Maintained a website using HTML, CSS, and Twitter Bootstrap for front-end development while also making backend adjustments using PHP and MySQL.


Android App Developer Intern at New Jersey City University — 2013
Worked as an associate of an android application development team to build a tool for medical offices that submits information through a secure form using PHP and Ajax. Personally helped develop some of the form pages.


Web Developer Intern at Monmouth University; West Long Branch, New Jersey — 2012
Programmed an FAQ with a database. Worked with MySQL, PHP, HTML, CSS, and JavaScript to develop a template structure implemented by the university.


Industry Projects

Castlight Health

Model tuning, training, and development. Data visualization and analysis.

RivieraPartners

Model tuning, training, and development. Data visualization, analysis, pipelining, and infrastructure.

Doximity

Machine Learning Algorithm, Data visualization, Geolocation/Fuzzy-String Matching



Personal Projects

WriteMind (2018)

Developing private blog site that allows users to publish entries to a private account, then analyzes that information to keep track of sentics based on their writing (Aptitude, attention, pleasantness, sensitivity, moods, sentiment, etc).

Sentic Python Package (2018)

Built the Sentic package API which can conduct sentiment, mood, and semantic analysis in multiple languages. Based on the Senticnet4 tagged dataset from http://sentic.net/.
(Repo Here)

Topic Recommender Using Ubuntu Forum Files (2017)

Used Ubuntu forums data to extract top ten relevant words of each conversation using Latent Direchlet Analysis (LDA).


ClimateChase (2016)

Flask/React game where player tries to balance energy investments between nuclear, solar, wind, and fossil fuels.

PDF-To-Audiobook Converter (2016)

Used Google's Python Tesseract package in combination with MacOS's Siri interface to extract text from PDF-text and images, turning them into audio-formatted files.

Histogram Plotter & Filter (2015)

Sequential Histogram Plotter with Terminal interface. Allows individual to input contraints in order to narrow down their search. (Python)




Extended Class Projects

Groupie (2015)

Website where people can form groups and meet up with others (Ruby on Rails, HTML/CSS, Bootsrap).

XRP Trade Algorithm (2014)

Built Python Wrapper and Terminal Interface for the Ripple(XRP) API, as well as a sentiment-analysis algorithm to automate exchanges - 3rd Place in Ripple API Contest. Changes in the client API have rendered this deprecated.