Wlodzislaw Duch
H6429 fall 2006 course, first assignment.


We have finished lectures on visualization, so it a good time for your first assignment, due on 1st of October, end of the recess week.

  1. Pick up your favorite mutlivariate visualization method; either one that I have presented or others that you find in the WWW links. Note that the emphasis here is on learning from visualization, not dimensionality reduction or super learning system for regression!
    Here you will find some links to visualization methods, more visualization links, links to SOM and its applications. You may also find some new methods browsing the Internet on your own. Search for non-linear dimensionality reduction, isometric mapping; multidimensional scaling; stochastic proximity embedding; reducing the dimensionality of data with neural networks etc.

    Many interesting papers with novel methods are below:
    manifold learning; more manifold learning;
    visualization of hidden nodes in neural networks;
    visualization of neural network decisions;
    SVM - kernel visualization.

  2. To avoid duplication of the same methods applied to similar problems please let me know what you try to do; send me an email with your name, software name, method and data, and I shall put it on this page and let the others know that it is taken. The assignment should not be a part of your thesis, as this will be an unfair advantage over other students.
  3. Describe the method in details in your formal report or on a Web page; description should be sufficiently detailed to explain the algorithm, understand what the software is doing and how to interpert it, but should not be a verbatim copy from other people's work! The point is to understand the method and then write about it yourself.
  4. Find software that implements it and describe it briefly. If you choose a larger software package, like XGobi, select one or two methods, don't describe the whole package!
  5. Find some interesting multidimensional data: either generate some data yourself, or grab some data from the network. Iris is not an interesting data and you will not learn much from it! Some interesting data sets for visualization are found here; please let me know if you find more interesting data repositories, I shall add it there. You may also have your own data.
  6. Prepare your data (change of format or standardization may be needed), and visualize it.
  7. Analyse the usefulness of the results; what can we learn from this excercise? Perhaps some hypothesis can be proved/disproved, perhaps interesting structure in data will be noticed.

Electronic copy submission is sufficient: please zip or compress all files, give the file your name, write in the email Your-Name, Paper Title, Method Used, Software Used, Data Used (this will be placed in the Table below), and send it to me. Please keep this format to make it easy for me, renaming all your files from "assignment2" etc. to your names is not fun.
If more than one file is send please zip or compress them, give the file Your Name and send it to me.
Try to minimize the size of the file, I want to attach them to this page and I do not have much space in WWW!

Q & A:

1. Is a formal report required? If so, how long should it be?
Your paper is a report. The length should be sufficient for others that have taken this course to be able to understand the method and interpret the results.

2. As visualization is clearer on screen, should we design a webpage instead of writing a report?
Web pages are OK, we may put them into the e-dventure space, but please send me the files.

3. What must we do to score high marks for this assignment? Do we have to study the visualization technique in much greater depth than what was covered in the lectures?
Find an interesting visualization method, a new variant of one of the methods I talked about, or maybe quite new; find interesting data, visualize it, provide interpretation of the visualization, describe what have you learned by doing visualization, and mention potential applications of this knowledge.

Please note that if you did not send me complete data I will not put it into the table!

Topics taken for the first assignment in 2006. Max. number of points is 10

NoNamePaperMethodSoftwareDataRemarksMark
41Annamala Sarayu Parimal MDS for ZOO+fMRI dataMDSXLSTATZOO/FMRIVery good paper but too much copied verbatim from the Internet8
30Ardian Kristanto Poernomo HiT-MDS for cDNAHitMDSHitMDS code+GGobiGene-Drug CorrelationInteresting method, but not clear what is the data, what is in figures, what has been learned ...4
28Bramandia Ramadhana Polygonal Line Principal CurvesPrincipal Curves, Polygonal LineKegl Java programKegl examplesDescirbes the topic quite well, but misses the point - no intersting data, nothing learned about hte data!6
39Chen Tze Chiang PCA for Boston housing dataPCAXLStat 2006Boston housing dataMany errors: covariance formula is wrong! Cov may be negative! It is nonnegative definite matrix. Feature vector is not a matrix. Good analysis of results.6
1Chia Yong Sang, Alex Visualization of High-D Complex Dataset using Relational Perspective MapRelational Perspective MapVisumapCredit scoring (Munich), SFB386Cox did not developed MDS, otherwise geat theory, data description and analysis.10
5Cheu Eng Yeow Radviz visualization of wine dataRadviz visualizationOrangeWineGreat paper!10
21Chong Yee Seng Pima-Indian-Diabetes Using PCA/Kernel PCAPCAMatlabHeart Disease, UCIKernel PCA is very briefly described; RBC=>RBF; there are many RBF functions, what is RBF kernel? Argument? Overall good data analysis.7
8Dang Xuan Hong PCA in data visualizationPCAMatlabWine + Wisconsin Breast CancerGood although general into on PCA; short decription of software; GhostMiner plots are shown but not referenced; 1D PCA looks much better in N-dots plot; which components are most important? What have we learned?6
11Han Shuguo Data Visualization Using FDA/Kernel FDAFDA and Kernel FDAMatlab+STPRToolIonosphereHow is separab calculated? What are the straigt lines on figures? to separate classes in Fig. 4 line should be to shifted to the right. What can we learn from these figures?7
37Hu Meiqun PCA/MDS for image segmentation PCAGhostMinerImage SegmentationVery good description of theory & data, interesting experiments9
27Hu Meishan SOM visualization on Glass IdentificationSOMYaleGlassSome editorial corrections needed but overall nice paper, well focused on data analysis8
7Huang Dong Wine recognition with FDA FDASTPRtool+WD FDA implementationWineTheory is fine, many pictures have been generated, FDA in 2D is suitable for data separation but we have not learned much about the data.7
20Huang Yi Restoring Ink Bleed-Through Degraded Documents PCA + k-means clustering MatlabOwn imagesInteresting application, well focused8
10Iti Chaturvedi Visualization of Gene Regulatory Networks using Bayesian NetworksDynamic Bayesian NetworksGeneNetworksS. Cerevisiae Microarray data (cell-cycle)Novel for this course, dynamic networks, well done although less focused on data9
12John Felix Charles Joseph Comparison of classical MDS, LTSA & ISOMAPMDS, LTSA and ISOMAPManifold Learning Toolbox, MatlabNon-Symbolic Features of KDD Cup 1999Interesting methods; classical scaling is not MDS, interesting data, outliers hsould be removed in Fig.48
43Koh Chin Wei, Eugene Visualizing Low-level Audio Features Using SOM SOMSOM Toolbox 2.0, SDH ToolboxAudio, own collectionInteresting experiments and very good description of data and methods and great analysis10
34Le Minh Nghia Handwritten digits visualization using Diffusion MapDiffusion mapMANIfold learningHandwritten digitsError in conjugate formula, but intersting method, data and interepretation9
25Maggy Anastasia Suryanto Visualization with Locally Linear EmbeddingLLEManifold ToolboxWisconsin CancerGood description of LLE, weaker on interpretation7
31Mohamad Hirwan HitMDS for cDNA on Barley dataMDSHitMDSBarley seed expressionNo references to methods/data, some symbols not explained, a single experiment made6
22Nah Hock Choon, Edwin Principal curves for hand-written charactersPrincipal curvesKegl et alNIST database of handwritten charactersGood algorithm description, but not much data analysis and learning from data.6
35Nai Hong Hwa Francis Visualization of high-dimensional data with relational perspective mapRelational Perspective Map (RPM)VisuMapEcoli Proteinstraveling abroad ... 0
2Nguwi Yok Yen Road Sign Visualization with PCA & Emergent SOMPCA+Emergent SOMDatabionic ESOM ToolsRoadsignsInteresting data and well described, although sometimes confusing9
24Nguyen Luong Dong SOM for country data SOMMATLAB+SOM Toolbolx4 continents/4 categories country data Only basic SOM forms description; some inacurate statments, analysis not too useful6
15Nguyen Minh Nhut Visualization using locally linear embeddingLLE and PCAMatlab toolboxIonosphere Description of PCA nad LLE up to the point; not much learned about the data itself7
18Nguyen Trung Hieu Learning from visualizationLaplacian EigenmapMANIfold learningDigits with pressure info; Boston housingInteresting and well described method; not much on software but several nice experiments; ref. 1 incomplete9
33Pham Manh Tung Isometric feature mappingISOmap methodMANIfold learningPen-Based Handwritten DigitsRather informal description of methods, several papers quoted; not much on software but not much learned about data7
23Phua Si Jie SOM and ViSOM for Handwritten Digits Visualization-induced SOM (ViSOM)SOM+SPRTool ToolboxPen-Based Handwritten DigitsVery nice work!10
3Puah Wee Choo Nonmetric Multidimensional Scaling for VisualizationNonmetric MDSMatlab EDA toolboxShortest inter-depot traveling timeGreat description of non-metric MDS; quite unusual data, and interesting analysis10
32Ronny ICA for Blood Vessels Extractions in Retinal ImagesICAICALABRetinal ImagesFine description of CIA and software; interesting data but not much on experiments and learning about data.7
19Sim Sian Hui Kelvin SOM for stock clusteringSOMSOM tollboxFinancial ratios of S&P 500 stocks Good SOM/U-matrix description; many experiments, novel input processing, interesting conclusions.10
26Song Hengjie Visualization with SOMSOM+scatterogramsYaleIrisMethods and tools are fine, but for Iris it is hard to draw interesting conclusions; scatterograms in x3, x4 give more ino ...7
40Tan Wi-Meng, Javan Interpretability of Visualization Techniques Using Scatterograms & Star GlyphsScatterograms & Star GlyphsXmdvToolCarsStatistic is wrong: cars have min 3 cylinders not 5.5, no. cylinders and orgin have interesting correlations; good focus on data6
29Teng Teck Hou Statistical Analysis of Forest Cover Type (djvu format) PCA, KMeans, histogramsVisual C++Forest Cover TypeMot much on methods and software, but many figures showing histogram, scatterograms, clusterized scatterograms and PCA, and numerous observations made.8
17Tu Tong Visualizations of Signatures using Laplacian EigenmapsLaplacian EigenmapsManifold Learning ToolboxSignatures, gpdsSIGNATURE databaseInteresting method, well described, despite poor feature extraction some lessons are learned. 9
9Umair Rafique Visualization of Neural Network DecisionsWD ProjectionWD Matlab+Netlab packageGlass and LettersInteresting experiments to learn about neural network performance rather than data itself.8
42Wan Kong Wah SOM Elucidating Structures in Multimedia ContentSOMSOM+VOICEBOX ToolboxesMultimedia WAV data of tennis, classical music and pop songShort theory but very interesting data and experiments; "fir-elipse? Fur Elise.10
14Wang Di MDS for basketplayersMDSPermapNBA basketball playerTheory described only verbally; detailed software description, intereresting data and experiments10
16Wang Lin Visualization of PCA and FDA on Waveform DataPCA/FDASTPRToolMonk + WaveformVery good on theory, data well suited for the methods, nice experiments.10
36Woo Huizhen, Jessica Sammon+curvlinear distance for mass spectrometry dataSammon mapping and Curvilinear data analysisMZmineFatty acid amide hydrolase (FAAH)Description of CCA and data not too clear (how many peakes has been used?), data interepretation is fine.7
38Wu Min PCA, Kernel PCA and MDS for Zoo dataPCA, Kernel PCA & Classical MDSSTPRTools and XLSTATZooTheory kept on basic level, software not described, not much learned from the data6
4Yeong Sui Sum Scatter Plot MatrixSOMSOM in Maple or MatlabBody fat and cirumferencesNot much to say ...4
6Zhang Xuejie PCA/Kernel PCA for Glass identificationPCA & Kernel PCASTPRToolGlass dataGreat paper in all respects!10
13Zhao Guopeng Analyzing Extreme Learning Machine by VisualizationWD ProjectionWD Matlab+Netlab package+ELM Matlab codeK-category dataInteresting problem, great analysis!10