# Neural network architectures

Review a few important neural network architectures, including VGG, Resnet, GoogleNet(Inception), MobileNet.

Since 2012 AlexNet was published, many architectures have been developed to significantly improve the accuracy, increase the depth of neural networks, and reduce the model size as well as calculation operations. Here I study and review a few important developments.

Let’s first have a big picture of these neural architectures regarding the accuracy, size, operations, inference time and power usage. This is a paper from 2016 so it doesn’t include MobileNet and other latest developments.

Figure 1 shows 1-crop top-1 accuracies of the most relevant entries submitted to…

# XGboost

A brief review of boosting, gradient boosting, gradient boosting decision tree (GBDT) and XGboost

Boosting is a statistical ensemble method, in contrast to Bagging (Bootstrapping aggregation). Bagging trains each base classifier independently and averages the prediction. Boosting trains each base classifier sequentially and uses “residuals” from the previous classifier to train the next classfier. The generic framework of Boosting consists of addictive models and forward stepwise learning.

Although Step 2.a just mathematically represents the goal in this step with a single equation, it is the essential step to actually train a new classifier. Step 2.a depends on the loss function…

# Udacity A/B Testing Lesson 4: Designing an Experiment

## Overview

• Choose “subject” — units of diversion
• Choose “population” — equivalent population
• Size
• Duration and Exposure

It is an iterative process to try out some decisions for unit of diversion and population, see what the implication is on both the size and the duration of the experiment. Depending on the results, we will need to revisit the decisions and iterate.

Unit of diversion basically answers the question that “how to assign events to either the control or to the experiment”. Even though the metric is computed based on the events (e.g. page view), the unit of diversion decides how these page…

# Udacity A/B Testing-Lesson 3: Choosing and Characterizing Metrics

## Variability: Analytical vs. Empirical

Use A/A tests to

• Compute variance and confidence interval based on the assumption of the distribution (usually normal distribution)
• Directly compute the confidence interval without any assumption of the distribution
• Compare empirical results to analytical results (sanity check)

For example, 20 A/A experiments, 50 users per group in each experiment and one click-through-probability computed based on one experiment from 50 + 50 users. The following table shows 20 experiments (20 rows). Take the first row for example. Based on the clicks and pageviews of 50 users in Group 1 and 2, the CTP is 0.1 and 0.04. The difference is…

# Notes for Reviewing SVM

A mixture from multiple textbooks and online resources

A typical way of solving classification is to find a hyperplane in the feature space. The algorithms that use this approach include SVM and logistic regression (the hyperplane of logistic regression is the one getting through y=0.5. How does logistic regression find that hyperplane? By fitting the data points with logistic regression function.).

Given a point x0 and a line wT*x + b = 0, the functional margin between the point and the line is

`functional margin = wT*x0 + bgeometric margin = (wT*x0 + b) / ||w||`
`min 1/2*||w||^2s.t. yi(wT*xi…`

# Lesson 1: Overview of A/B Testing

A/B testing consists of choosing a metric, reviewing statistics, designing experiments, and analyzing results. A/B testing is a general control/experiment methodology used online to test out a new product or a feature. For example, two groups of users act on two versions of websites, their activities will be recorded, some metrics will be computed based on the activities, and the metrics will be used to evaluate the two versions. A variety of things can be tested, from some new features, additions to your UI, different look for you website. Examples:

• Amazon launching personalized recommendations increases the revenue

These are some notes for reviewing the statistics knowledge while I was studying the lesson 1 of Udacity A/B testing. Specifically, it is for binomial distribution converging to normal distribution when n is large. Here is a more basic note for understanding the intuition of CLT and confidence interval I wrote previously, mostly assuming a normal distribution.

In Udacity A/B testing session 1, the instructors reviewed how to compute confidence interval of the estimated probability p of binomial distribution. When n is very large, binomial distribution tends to converge to normal distribution. Thus, the same formula to estimate the mean…

# Review Intro to Algorithms (Graph)

Graph basics

Vertex (V), Edge (E)

Undirected and directed graph: for undirected graph, there is a handshaking lemma, sum(degree(v)) = 2|E|

Adjacency list: O(|V|+|E|) * w where w is the word size. The advantage is that 1) for sparse adjacency matrix; 2) multiple graphs can use the same nodes

Adjacency matrix: O(V²) * 1bit, good for dense matrix

OOP: one graph use one set of nodes, good for clean code

Goal: traversal the connected component of one graph from one starting node level by level

Application: find the shortest path from a starting…

# Hypothesis testing, part 1 for one mean

The null hypothesis: N0

The alternative hypothesis: Na

Normal distribution and Z statistic vs. t distribution and t statistic

For one mean inference, suppose sampling from a normal distribution.

• When the population variance is known, if H0 is true, the test statistic is z statistic and it has the normal distribution.
• When the population variance is unknown, the sample variance is an estimate, which changes some fundamental math. If H0 is true, the test statistic is t statistic and it has the t distribution with n-1 degrees of freedom. Compared to standard normal distribution, t distribution has lower peaks and…

# AWS Setup

Create an AWS instance.

Save the .pem file, cd to the folder, and do ‘ssh -i xxx.pem ubuntu@xxxx.com’. The ssh information can be found when click “connect” on the instance.

# Set up Docker

sudo dpkg — configure -a
sudo apt install docker.io
sudo usermod -a -G docker \$USER (https://techoverflow.net/2017/03/01/solving-docker-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket/)
find \$USER using whoami
test using “docker run hello-world”

# Change the Security Group to open a port

After set up Docker, Anaconda and Clipper on AWS. Run the Clipper deployment. Because the port for the Clipper application is 1337, create a Security Group with a Custom TCP Rule for port 1337.

# Set up S3 storage 