Keio University Shonan Fujisawa Campus
Course Summary (Syllabus)
INTERNET MEASUREMENT AND DATA ANALYSIS （Kenjiro Cho）
1. Objectives/Teaching method
In this class, you will learn about data collection and data
analysis methods on the Internet, to obtain knowledge and
understanding of networking technologies and largescale data analysis.
Each class will provide specific topics where you will learn the
technologies and the theories behind the technologies.
In addition to the lectures, each class includes programming exercises
to obtain data analysis skills through the exercises.
2. Materials/Reading List
The lecture slide materials will be provided online.
ruby: http://www.rubylang.org/
gnuplot: http://gnuplot.info/
[1] Mark Crovella and Balachander Krishnamurthy.
Internet measurement: infrastructure, traffic, and applications.
Wiley, 2006.
[2] PangNing Tan, Michael Steinbach and Vipin Kumar.
Introduction to Data Mining.
Addison Wesley, 2006.
[3] Raj Jain.
The art of computer systems performance analysis.
Wiley, 1991.
[4] Toby Segaran.
Programming Collective Intelligence.
O'Reilly Media. 2007.
[5] Allen B. Downey.
Think Stats: Probability and Statistics for Programmers.
O'Reilly Media. 2011.
[6] Chris Sanders.
Practical Packet Analysis, 2nd Edition
No Starch Press. 2011.
3. SCHEDULE
#1 Introduction  Big Data and Collective Intelligence
 Internet measurement
 Largescale data analysis
 exercise: introduction of Ruby scripting language #2 Data and variability  Summary statistics
 Sampling
 How to make good graphs
 exercise: graph plotting by Gnuplot #3 Data recording and log analysis  Network management tools
 Data format
 Log analysis methods
 exercise: log data and regular expression #4 Distribution and confidence intervals  Normal distribution
 Confidence intervals and statistical tests
 Distribution generation
 exercise: confidence intervals
 assignment 1 #5 Diversity and complexity  Long tail
 Web access and content distribution
 Powerlaw and complex systems
 exercise: powerlaw analysis #6 Correlation  Online recommendation systems
 Distance
 Correlation coefficient
 exercise: correlation analysis #7 Multivariate analysis  Data sensing and GeoLocation
 Linear regression
 Principal Component Analysis
 exercise: linear regression #8 Timeseries analysis  Internet and time
 Network Time Protocol
 Time series analysis
 exercise: timeseries analysis
 assignment 2 #9 Topology and graph  Routing protocols
 Graph theory
 exercise: shortestpath algorithm #10 Anomaly detection and machine learning  Anomaly detection
 Machine Learning
 SPAM filtering and Bayes theorem
 exercise: naive Bayesian filter #11 Data Mining  Pattern extraction
 Classification
 Clustering
 exercise: clustering #12 Search and Ranking  Search systems
 PageRank
 exercise: PageRank algorithm
4. Assignments/Examination/Grad Eval.
5. Special Note
The prerequisites for the class are basic programming skills and basic
knowledge about statistics.
In order to understand the theories, basic knowledge of algebra and
statistics is required.
In the exercises and assignments, you will need to write programs to
process large data sets, using the Ruby scripting language and the
Gnuplot plotting tool.
To understand the theoretical aspects, you will need basic knowledge
about algebra and statistics. However, the focus of the class is to
understand how mathematics is used for engineering applications.
6. Prerequisit / Related courses
7. Conditions to take this course
8. Relation with past courses
9. Course URL
20140707 11:40:28.79667
