Portfolio
Data Science and Business Analytics
-
Created a python ML program with predictive models to consult organizations on their diversity benchmarks compared to diversity metrics for Fortune 500 companies. ____________________________
The dataset was compiled by data journalist Grace Donnelly as an effort to under stand where the country’s top companies are with their diversity promises ("Why We Logged Every Fortune 500 Company's Diversity Data or Lack Thereof").
_____________________________
The analysis calculates diversity scores using the Simpson Diversity Index model. This scoring model is often used by biologists to understand species diversity, which makes it a good model to score ethnic and gender diversity based on the assumption that all dimensions (7 races and 2 genders) are evenly divided. A “perfect” score would be 1, meaning that the company is evenly divided amongst all races and genders.
_____________________________
Techniques used for predictive modeling: SKlearn for testing/ training, Simpson Diversity index, K-means clustering, and decision trees .
_____________________________
Please note that this program was made for a hypothetical strategic DEI consulting scenario as part of my graduate coursework, and should NOT be used as a real system to measure diversity in an organization in its current state.
Here’s why:
The dataset is from 2017, and it shows that all of these companies still had a lot of DEI work to do at the time.
This dataset only includes race/ ethnicity and binary gender markers. In a perfect world, gender data would include other genders besides Male and Female (i.e. Non-Binary, Gender Non-Conforming, and “other”)
Disability data is not included either. This is an important aspect of measuring diversity, as 15% of the world population are disabled. A company that is truly seeking accountability in their DE&I work should ensure that space at the table is intentionally made for disabled people.
-
Collaboration with a start-up to build an environmental AI/ imagery machine learning product by using satellite imaging and computer vision techniques to detect areas at risk of flooding and other environmental changes in the “heat islands” that occur in large cities globally.
_________________________________
Personal contributions include: Product analysis, data wrangling for testing, training, and model validation, ML python code improvement, feasibility assessments of image modalities, and open source code research.
-
Performed a sentiment analysis of AirBnB listing data by scraping guest stay reviews and performing lexicon analysis by using the python textblob library to classify positive, neutral, and negative reviews.
Sentiments were transformed to numerical scores to analyze and create visualizations with matplotlib and adding to a marketing dashboard by using Tableau.
-
NLP project in python by parsing through tweets to understand the general public’s views on various presidential candidates ahead of the 2020 elections. Textblob and tweepy were used to process the tweets and categorize tweet language into positive, negative, and neutral segments.
-
Created a market analysis for items listed as “fine arts” by mining data from Etsy.com REST API.
Presentation audience was assumed to be artists who needed to learn how to price their art during COVID-19 pandemic without access to galleries.
-
Created and visualized a financial analysis of Yahoo Finance data for Yelp and its competitors by using python. The purpose of the project was to provide data informed support for a project proposing Yelp’s expansion to European markets. Sklearn was used for linear regression and matploblib for data visualization.
-
Performed and presented an in depth 10-year financial analysis of FedEx and competitors (UPS, Stamps.com). By using python to Sklearn used for linear regression and matplotlib/ tableau for data visualization, a 10-page analysis and recommendations were made on the evaluation of the stock value, performance, and perceived risk of FedEx.
-
Enriched a movie streaming client’s data set by wrangling IMDB website data for top 500 adult movies.
Techniques used to complete project include: webscraping, data wrangling, transformation, and in python of Alteryx was used to automate data.