Bird

Machine Learning Engineer / Data Scientist / Backend Developer

My Passion for Data Science

I've always been humbled by the importance of data. As information develops a prominent role as one of our modern world's most valuable resources, I aim to be at the forefront of our technologies in this field- understanding how we collect, analyze, and interpret data. From optimization to discovery, data science serves as the modern basis for technological innovation and development. We utilize machine learning algorithms for mapping genetic code, artificial intelligence for interactive hardware devices, and our massive databases to produce valuable information from what we already know. As someone who has always believed in using knowledge to empower people of all races, genders, and backgrounds, data science is more than a career to me- it's a philosophy.


I've been coding professionally since I was 16. Over the years, I've worked with and assisted PhD Students, senior and principal data scientists, backend engineers, and other machine learning engineers in my career. Not only am I a strong coder, but I maintain a focus on identifying quantifiable metrics and business value before beginning any type of machine learning project. In my experience, many companies and individuals waste resources when basic assumptions aren't considered or met, and only several months or even years later do people reap the consequences. To mitigate this risk, I establish a clear end-goal before any endeavor. Furthermore, I communicate and seek feedback regularly from engineering, business, and product to fully understand the constraints and challenges before me. The result has been half a dozen production-quality models that solve real business problems and deliver value. I'm an engineer that understands statistics, business value, and time-constraints. I'm your unicorn.

Experience

Machine Learning Engineer at Freenome – 2020 - 2024

  • Core contributor to Freenome's 2nd and 3rd iterations of the Multiomic Research Platform. Led Metadata Storage redesign, object storage cost management, and rebuilt the majority of the protein data transformation pipeline.

  • I've personally:
  • Earned a Servant Leadership Award by votes from managers and peers.
  • Spearheaded object cloud storage management & retention policies for two years.
  • Saved $8M+ in Google Cloud Platform (GCP) costs.
  • Designed protocols for data life-cycling.
  • Built tools for GCP costs analytics and transparency.
  • Built most of the test suite for the ML Research Platform’s Execution Engine.
  • Built custom workflows with Flyte, for large-scale data processing and large-scale ML computation.
  • Helped build an ML platform that can support high data throughput.
  • Taught git to more than 50 employees through internal instructional series.
  • Refactored 75% of Freenome’s legacy machine learning execution framework.
  • Built Machine Learning analytic tools for loading experiments to analyze results.
  • Helped redesign and implement Freenome’s newest Machine Learning Platform.
  • Helped establish CI/CD procedures including internal software package management and linting.
  • Built one of the most used ML execution tools for classical and deep ML experiment analysis.
  • Wrote most of the documentation for internal ML tools, protocols, and terminology, including system designs and rationales for a year or two.
  • Summarized and presented deep-dives of multiple software tools and algorithms (Vaex, Polars, Hyperband, distributed computational strategies from AAPI 2021).
  • Built tools with MLFlow for researchers to analyze metrics from ML experiments.
  • Helped debug ML experiments written by researchers.
  • Fostered a supportive and collaborative team culture.
  • Optimized code for memory and compute efficiency, as well as legibility.

  • Machine Learning Engineer at Change Healthcare – 2020

  • Created classification model for chart-splitting aross sensitive PDFs in distributed fashion with 70% AUC
  • Built task-assignment ML model that saved $7M.
  • Prototyped chatbots with Rasa AI and GPT3

  • Data Scientist at RivieraPartners — 2019

  • Designed and trained classifier to surface quality tech talent candidates.
  • Designed and trained a regression model to predict team sizes managed by a candidate.
  • Prototyped a ranking model for matching recruiting prospects to jobs using a custom NDCG listwise loss function.
  • Improved survival model to determine likeihood of candidate leaving a job within a time frame.
  • Established end-to-end environment for rapid model prototyping and development (single-machine, CPU).
  • TF-IDF, Word2Vec, one-hot encoding, neural networks, SVD, NDCG, ListMLE, XGBoost, and Hyperband

  • Research Collaborator with The Bengson Research Laboratory at Sonoma State University – 2018
    Working with a team of researchers in using machine learning on alpha, beta, and theta waves collected from EEG electrodes in subject trials to computationally predict individualized occipital lobe activation. Initial research suggests feasibility of a brain-computer interface.


    Undergraduate Research Assistant at California Institute for Energy and Environment (CIEE) – 2018
    Working with a team of researchers in using the Extensible Building Operating System (XBOS) for energy usage predictions. Exploring gaussian process and recurrent neural network (RNN) models.


    Data Science Intern at Castlight Health – 2017
    Matched externally sourced entities (hospital/non-hospital facility/practitioner) to entities within our database. De-duplication process involved machine learning in order to optimize for similarity-matches. Utilized hard negative mining to improve our small (less than 1K) training dataset. Gradient-boosted classifier with hyper parameters tuned by grid search and retrained with hard-samples produced results of 85-95% precision and recall depending on entity. Validated with k-fold cross validation. Used Receiver Operating Curve (ROC) and area under curve (AUC) as another measurement of performance.


    Data Science Contractor at RivieraPartners — 2016
    Used a gradient-boost algorithm for determining company team sizes based only on publicly available information. Built Python wrapper for a survival model time-series. Set up Flask model-serving infrastructure.


    Undergraduate Research Apprentice Program (URAP) at Berkeley Institute of Data Science (BIDS) — Spring 2016
    Worked on Mapping Team to help map UC Berkeley course progression through different majors. Computationally organized taxonomies of classes, ran deduplication processes, and helped plan data visualization.


    Data Science Contractor at Doximity — 2015
    Utilized a supervised machine-learning algorithm with a gradient boost classifier to identify mal-formed articles scraped from other sites using BeautifulSoup, NLTK, and data visualization tools. Triangulated doctor names in news articles to facilities and database profiles using reverse geocoding and fuzzy string matching.


    Web Developer Intern at PriceWater Capital — 2014
    Maintained a website using HTML, CSS, and Twitter Bootstrap for front-end development while also making backend adjustments using PHP and MySQL.


    Android App Developer Intern at New Jersey City University — 2013
    Worked as an associate of an android application development team to build a tool for medical offices that submits information through a secure form using PHP and Ajax. Personally helped develop some of the form pages.


    Web Developer Intern at Monmouth University; West Long Branch, New Jersey — 2012
    Programmed an FAQ with a database. Worked with MySQL, PHP, HTML, CSS, and JavaScript to develop a template structure implemented by the university.


    Industry Projects

    Myndful.us

    Built an ML-powered habit-tracking app for mental health. Managed a team of 8.

    Castlight Health

    Model tuning, training, and development. Data visualization and analysis.

    RivieraPartners

    Model tuning, training, and development. Data visualization, analysis, pipelining, and infrastructure.

    Doximity

    Machine Learning Algorithm, Data visualization, Geolocation/Fuzzy-String Matching



    Personal Projects

    WriteMind (2018)

    Developing private blog site that allows users to publish entries to a private account, then analyzes that information to keep track of sentics based on their writing (Aptitude, attention, pleasantness, sensitivity, moods, sentiment, etc).

    Sentic Python Package (2018)

    Built the Sentic package API which can conduct sentiment, mood, and semantic analysis in multiple languages. Based on the Senticnet4 tagged dataset from http://sentic.net/.
    (Repo Here)

    Topic Recommender Using Ubuntu Forum Files (2017)

    Used Ubuntu forums data to extract top ten relevant words of each conversation using Latent Direchlet Analysis (LDA).


    ClimateChase (2016)

    Flask/React game where player tries to balance energy investments between nuclear, solar, wind, and fossil fuels.

    PDF-To-Audiobook Converter (2016)

    Used Google's Python Tesseract package in combination with MacOS's Siri interface to extract text from PDF-text and images, turning them into audio-formatted files.

    Histogram Plotter & Filter (2015)

    Sequential Histogram Plotter with Terminal interface. Allows individual to input contraints in order to narrow down their search. (Python)


    Groupie (2015)

    Website where people can form groups and meet up with others (Ruby on Rails, HTML/CSS, Bootsrap).

    XRP Trade Algorithm (2014)

    Built Python Wrapper and Terminal Interface for the Ripple(XRP) API, as well as a sentiment-analysis algorithm to automate exchanges - 3rd Place in Ripple API Contest. Changes in the client API have rendered this deprecated.