Friday, January 8, 2021

How to Classify Data with Machine Learning.


Figure 1: Decision tree graph built with classification data linked below.

For new programmers, building a machine learning model can be an intimidating task.  Machine learning requires basic knowledge of programming, access to a dataset to build a model from, and knowledge of how to implement the actual AI.

Like many problems, Machine Learning can be simplified by breaking down a problem into many small pieces.  The data science website Kaggle also does a good job helping developers learn programming.  Kaggle offers free access to hundreds of open datasets, as well as the ability to create notebooks to access and make predictions from this data.  As a result, developers spend less time trying to install dependencies or looking for data, and more time accomplishing their goals.

To demonstrate some basic Machine Learning techniques, I created a Python notebook in Kaggle from the dataset "Mushroom Classification".  A Python notebook is like a Word document that also allows a developer to run code in small chunks.  Notebooks are a great tool not only for splitting up large tasks, but also to convey results of a program to non-technical people.  As an example for classifying data, I created a notebook guide linked here in Github.  To run the notebook, either download the dependencies and dataset from Kaggle, or create a notebook in Kaggle from the first link.

Figure 2: A graph using SHAP, a
strategy to analyze the thought process of an AI.

In the Mushroom Classification dataset, the authors collected information about thousands of mushrooms such as size and shape, with the goal of using these features to identify poisonous ones.  The guide I linked to Github will show examples of how to write several Machine Learning models, as well as how to display results and minimize bias.

The linked guide will also give examples on how to explain the results of different AI methods, such as SHAP.

Monday, April 9, 2018

Hockey Stick Growth

Hockey Stick Growth happens when a startup has slow growth for a period, followed by an inflection that leads to explosive growth.

A popular example of a hockey stick company is Groupon.  The company's founder Andrew Mason originally started the company as a pivoted version of his previous company, a social web platform that only gained modest traction.

Groupon was founded in 2008, and during 2009, its subscribers tripled every quarter, and doubled every quarter in 2010.  After 16 months of business, the company was valued at $1 billion, and had over $170 million in funding.  During those 16 months, the company scaled its employees from a few dozen to over 350.

Companies with these growth spikes are called Unicorns, and include companies such as Groupon, Google, and Uber. While most startups don't have the explosive growth of Unicorns, successful companies are prepared for scaling their companies.

Scaling a company starts at a local level.  The percent of the startup's target market that uses the company's app or website is called Market Penetration.  By tracking the number of users, amount spend on acquiring new users, and the customers' lifetime value, you can predict how your growth will scale as you move into new territories. 

By tracking your success in your local market, you can also plan ahead for scaling.  Even online products need more employees to scale well.  When an online platform scales, the company needs more developers to make sure the platform can sustain thousands and even millions of users.  An influx of users means more eyes on your page, which also means a higher chance of an user finding a bug in your software.  Additional developers are needed to make sure the server stays running, bugs are swiftly fixed, the platform is secure, and the app is fully optimized with all analytics being tracked.

Your first target market penetration is also a solid indicator of your company's future.  If you capture half of your target audience in your local market, you should expect similar success in your next markets, and should prepare for quick scaling (assuming your app has a low customer churn).

On the other hand, your first market penetration can let you know when to pull the plug.  It is important to scale your company when you gain traction.  Failing to make needed hires or gain funding to expand can make you miss a big growth opportunity.
However, scaling your company too fast can also have heavy consequences.  The 'Amazon for food' company Webvan was infamous for scaling its company without market validation.  The company raised $800 million in funding, despite having very little market success. 

When your company grows, it is important to hire co-founders and employees whose skills also scale.  One lesson Facebook learned was to not pay temporary workers equity.  One of their office painters earned shares that are now worth $200 million, a deal made instead of paying the painter a few thousand in cash.

Monday, April 2, 2018

Heartbeat Recognition In Python

Biometric passwords are becoming more mainstream in society.  Iris scans, facial recognition, and fingerprints are slowly replacing passwords, as they are unique to each person and can't be forgotten.

But how does this tech work?

When you add your fingerprint to an iPhone for example, you have to press your thumb to the home button about 20 times.  These prints are recorded and called a Training Set, which trains the iPhone to recognize your thumb in different positions.  Each print in the Training set is stored as a list of Features, or predictors that show what makes your thumb different from everyone else (source).

Fingerprints have been used for identification for over 120 years, and have been verified that no two are alike.  However, are other biometrics, such as heartbeats, able to identify a person out of a group?  Surprisingly, the answer is yes.  Heartbeats are collected using Electrocardiogram (EKG) machines.  (If you have watched any hospital show, these are those beeping machines next to patients).

For a semester project in my Biology class, I decided to make an EKG recognition program myself.  I recruited ten of my classmates and read their heartbeats with an EKG recorder three times.  For each person, two recordings were put into the Training set, and the third was used for calculating the accuracy of the algorithm.  After all the samples were taken, I wrote a quick Python script to extract features from the EKG readings.  The script looked at the local maximum and minimums of each heartbeat, as well as the distances between them.  Then, I used a Bagging Classification algorithm to see if I could guess the identity of each subject out of a group, using only their heartbeat.

Surprisingly, the prediction algorithm could correctly identify 8 out of the 10 subjects on the first guess.  While not close to perfect, a random guess would only correctly identify 1 of the 10 subjects.  The program on average took 2 guesses on any subject to correctly guess the person's identity.

A main source of error in the experiment was the quality of the lab's EKG machines.  While hospital-grade EKG's have accurate recordings like the picture above, the school's EKG recordings looked more like sine waves.  In addition, more error could exist in the data collection because I am an amateur at recording heart beats.

After developing the heartbeat recognition algorithm, I confirmed that a person's heartbeat is unique enough to correctly identify them out of a group of people.

Saturday, March 3, 2018

Genetic Algorithms 1

A genetic algorithm is an algorithm based on biological theory of natural selection.  Genetic algos are meta-heuristics, or "good enough" solutions for optimizing a process.  A program using genetic algorithms starts off with a 'population' of algorithms, with randomly valued characteristics.  The algorithms with the least desired behavior are discarded.  The rest of the algorithms are 'bred' in which two algos combine their behavior, with a chance of random mutation.  This process repeats over and over in which the worst performers are discarded and the best performers carry on their 'genes' to the next generation.

To illustrate this behavior, I coded a very basic genetic algorithm to optimize playing the infamous Flappy Bird game.  In this simulation, each bird has two genes: one gene determines how often the bird jumps when the pipe is in front of the bird, and another gene for how often the bird should jump when the pipe is above or below the bird.  For cosmetic purposes, each initial bird starts off with a random color.  Newly born birds have a mixed color between their parents.

For the first seconds in the video, the initial birds have a wide variety of genes.  Some of the birds instantly dive into the ground or fly off the screen.  Since crashing before the pipes even appear on screen is an unfavorable trait, these birds are not given the chance to reproduce.  Since the population shrinks every time birds crash, a random surviving bird will mate with another bird and produce an offspring with traits similar to its parents.

When the first pipe comes on screen, about two thirds of the birds fail to fly through the gap.  The survivors then instantly reproduce.  You can see the bird genes are evolving because after the tenth pipe, almost no birds collide into the obstacles.  You may also notice that unlike the brightly colored birds at the beginning of the simulation, the current generation of birds are all the same dull yellow color due to an evolutionary bottleneck (the obstacles).

As the simulation goes on, fewer and fewer birds crashed into obstacles.  In the first ten seconds, 114 birds crashed. In the next ten seconds, only 46 more birds were lost.  By 30 seconds, only an additional 15 birds hit a pipe.

Many genetic algorithms rely on neural networks for calculation.  Neural nets involve complex functions that have multiple inputs (the environment) and multiple outputs (how the individual should act).  This Flappy simulator did not use a neural net since there were only two inputs and 1 output.  Because the inputs and outputs were linearly related, their movement could be calculated using linear regression.

The purpose of this simulator is to show a very basic demonstration of what a genetic algorithm can do with just a couple hours of programming.

Here is the source code and executable file:
Source code
Downloadable game

Monday, January 29, 2018

Downloading iOS Software

(Work in Progress)

Downloading the software to program iOS apps requires two components: Xcode, and the iOS SDK.
Xcode is a development environment for making apps, similar to the AI2 interface.
The iOS SDK is a software program that lets developers run iOS apps on physical phones and upload apps to the app store (for a $99 annual fee).

To start, download Xcode.  One caveat of iOS development is that Xcode is only available on the Mac store.  However, it is possible to download Xcode on virtual machines.  Once downloaded, it is possible to start coding.  However, unless the iOS SDK is downloaded, you will not be able to launch your app on a physical device or put your creation on the App Store.

Next, create an account at iTunes Connect. When prompted to enter your credit card information for purchasing the iOS SDK, make sure to use a card in your name, and not a parent or relative.  If you use a parent's credit card for purchase, your info will not match the card holder's, and your purchase will be delayed for up to a few weeks.  (I found this out the hard way)

Wednesday, November 29, 2017

Startup Pitches And Decks

Jyve PitchBreakfast Pitch:

This is a 5 minute pitch by Jyve after graduating ICAT with a Q&A session afterwards.

Monotto 1Million Cups Pitch:

While this is not a five minute pitch, several teams that have continued after ICAT pitch to 1MC in order to practice presenting, get feedback from professionals, and find valuable connections.

This is a recommended pitch deck template for entrepreneurs seeking VC funding with Sequoia Capital.  While this template is meant for companies seeking funding, the structure is similar to pitches in ICAT.

Airbnb's pitch deck:

Tuesday, November 28, 2017

Adding QR Codes To Flyers

A popular and effective way to advertise your app is with flyers.  Some flyers simply have a description of your company and some text saying "download at".  However, many people may overlook your site simply because they don't feel like manually typing in your app's url or searching it on the App Store.  Luckily, there exists and alternative- QR codes.  It is possible to add a QR code that potential users can scan with their phones to visit your website.

QRCode Monkey offers a free QR code generator that lets your customize the color and style of your QR code link, as well as the ability to put a logo inside the code.  This lets your flyer catch the eye of users and make them more likely to download your app.

Here is an example of a QR code made with the website mentioned above.  Txtra is a company created in ICAT last semester.