Sign in

Review a few important neural network architectures, including VGG, Resnet, GoogleNet(Inception), MobileNet.

Since 2012 AlexNet was published, many architectures have been developed to significantly improve the accuracy, increase the depth of neural networks, and reduce the model size as well as calculation operations. Here I study and review a few important developments.

Let’s first have a big picture of these neural architectures regarding the accuracy, size, operations, inference time and power usage. This is a paper from 2016 so it doesn’t include MobileNet and other latest developments.

Figure 1 shows 1-crop top-1 accuracies of the most relevant entries submitted to…

A brief review of boosting, gradient boosting, gradient boosting decision tree (GBDT) and XGboost

Boosting is a statistical ensemble method, in contrast to Bagging (Bootstrapping aggregation). Bagging trains each base classifier independently and averages the prediction. Boosting trains each base classifier sequentially and uses “residuals” from the previous classifier to train the next classfier. The generic framework of Boosting consists of addictive models and forward stepwise learning.

Forward stepwise method

Although Step 2.a just mathematically represents the goal in this step with a single equation, it is the essential step to actually train a new classifier. Step 2.a depends on the loss function…


  • Choose “subject” — units of diversion
  • Choose “population” — equivalent population
  • Size
  • Duration and Exposure

It is an iterative process to try out some decisions for unit of diversion and population, see what the implication is on both the size and the duration of the experiment. Depending on the results, we will need to revisit the decisions and iterate.

Unit of diversion basically answers the question that “how to assign events to either the control or to the experiment”. Even though the metric is computed based on the events (e.g. page view), the unit of diversion decides how these page…

Variability: Analytical vs. Empirical

Use A/A tests to

  • Compute variance and confidence interval based on the assumption of the distribution (usually normal distribution)
  • Directly compute the confidence interval without any assumption of the distribution
  • Compare empirical results to analytical results (sanity check)

For example, 20 A/A experiments, 50 users per group in each experiment and one click-through-probability computed based on one experiment from 50 + 50 users. The following table shows 20 experiments (20 rows). Take the first row for example. Based on the clicks and pageviews of 50 users in Group 1 and 2, the CTP is 0.1 and 0.04. The difference is…

A mixture from multiple textbooks and online resources

A typical way of solving classification is to find a hyperplane in the feature space. The algorithms that use this approach include SVM and logistic regression (the hyperplane of logistic regression is the one getting through y=0.5. How does logistic regression find that hyperplane? By fitting the data points with logistic regression function.).

Given a point x0 and a line wT*x + b = 0, the functional margin between the point and the line is

functional margin = wT*x0 + b
geometric margin = (wT*x0 + b) / ||w||
min 1/2*||w||^2
s.t. yi(wT*xi…

Lesson 1: Overview of A/B Testing

A/B testing consists of choosing a metric, reviewing statistics, designing experiments, and analyzing results. A/B testing is a general control/experiment methodology used online to test out a new product or a feature. For example, two groups of users act on two versions of websites, their activities will be recorded, some metrics will be computed based on the activities, and the metrics will be used to evaluate the two versions. A variety of things can be tested, from some new features, additions to your UI, different look for you website. Examples:

  • Amazon launching personalized recommendations increases the revenue
  • visible things: Google…

These are some notes for reviewing the statistics knowledge while I was studying the lesson 1 of Udacity A/B testing. Specifically, it is for binomial distribution converging to normal distribution when n is large. Here is a more basic note for understanding the intuition of CLT and confidence interval I wrote previously, mostly assuming a normal distribution.

In Udacity A/B testing session 1, the instructors reviewed how to compute confidence interval of the estimated probability p of binomial distribution. When n is very large, binomial distribution tends to converge to normal distribution. Thus, the same formula to estimate the mean…

Graph basics

Vertex (V), Edge (E)

Undirected and directed graph: for undirected graph, there is a handshaking lemma, sum(degree(v)) = 2|E|

Adjacency list: O(|V|+|E|) * w where w is the word size. The advantage is that 1) for sparse adjacency matrix; 2) multiple graphs can use the same nodes

Adjacency matrix: O(V²) * 1bit, good for dense matrix

OOP: one graph use one set of nodes, good for clean code

Breadth-first search (BFS)

Graph representation: adjacency list

Goal: traversal the connected component of one graph from one starting node level by level

Application: find the shortest path from a starting…

The null hypothesis: N0

The alternative hypothesis: Na

Normal distribution and Z statistic vs. t distribution and t statistic

For one mean inference, suppose sampling from a normal distribution.

  • When the population variance is known, if H0 is true, the test statistic is z statistic and it has the normal distribution.
  • When the population variance is unknown, the sample variance is an estimate, which changes some fundamental math. If H0 is true, the test statistic is t statistic and it has the t distribution with n-1 degrees of freedom. Compared to standard normal distribution, t distribution has lower peaks and…

Create an AWS instance.

Save the .pem file, cd to the folder, and do ‘ssh -i xxx.pem’. The ssh information can be found when click “connect” on the instance.

Set up Docker

sudo dpkg — configure -a
sudo apt install
sudo usermod -a -G docker $USER (
find $USER using whoami
log out and log in again
test using “docker run hello-world”

Change the Security Group to open a port

After set up Docker, Anaconda and Clipper on AWS. Run the Clipper deployment. Because the port for the Clipper application is 1337, create a Security Group with a Custom TCP Rule for port 1337.

Set up S3 storage

Follow Getting Started Guide to…


machine learning and data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store