Friday, May 15, 2009

Setting up a Ubuntu mirror manually


Ok. So I've confirmed that my mirror works. So I'll describe the manual process of building your own mirror.

The simplest way to build your mirror is to follow the instructions provided by others (google: ubuntu mirror). The drawback is that you will need to download roughly 5.5 GB for the main repository alone (the multiverse universe is even bigger).

As I wrote earlier, I purchased the entire repository from a place called LinuxStore.ca.

1. Follow the instructions from http://www.howtoforge.com/local_debian_ubuntu_mirror to install apt-mirror and apache2.

2. Comment out all the entries in the /etc/apt/mirror.list file. Add a single entry like so (for easy testing):

deb http://closestmirror.ubuntu.com/ubuntu main jaunty

3. Run apt-mirror and break out when apt-mirror starts downloading packages.
Do a line count of the following files:

/var/spool/apt-mirror/var/NEW
/var/spool/apt-mirror/var/ALL
/var/spool/apt-mirror/var/MD5

The file NEW and the file ALL should have the same number of lines. (Which means apt-mirror would download all the files specified in NEW).

4. Copy the directory and all it's subdirectories and files (e.g. from the install CD):

cp -R /media/cdrom/pool/main /var/spool/apt-mirror/mirror \
/closestmirror.ubuntu.com/ubuntu/pool/

5. Run apt-mirror again and break again when it begins downloading. Check out the NEW and ALL files again. Notice that NEW should have fewer lines than ALL. This means that the mirror has accepted the files that you copied to your mirror. Which means that you will be able to update your mirror regularly (as designed by the creator of apt-mirror) as if it downloaded all the files from the internet.

6. Lastly, don't use the clean.sh script. I believe it builds a script which cleans up *everything* from your mirror. I thought it would clean up all the packages that is outdated. This is mistaken. The apt-mirror script makes a file size comparison between the file that exists in your mirror directory with an index file which it downloads from the external mirror. If they don't match then NEW gets updated with the name of the package to download.



And that's it. Pretty simple, huh?

Virtualization

I've been exploring virtualization technology for a year or so. At first I was interested in checking out Xen (since it was open source). But it appeared complicated and I didn't want to get involved in a compilation/setup cycle just yet. Recently, I've had the chance to actually try out a few products and here are some of my experiences with it.


I've tried VMware workstation in the past. It works well and it allows you to explore working with VMs on your own desktop. My problem with it is that when given a choice of working through a VM versus through the host you would choose the host naturally. I wanted a hypervisor (or a light-weight OS) which would not distract me from working with the VMs.


So I downloaded and tested different hypervisors available. XenServer and KVM. There is also VMvisor from VMware but I figured it was similar to XenServer. They are all basically the same.


XenServer and VMvisor are hypervisors which means they are light-weight OS managing and running on top of physical hardware. They provide the resources for running VMs on top of the hypervisor. I installed XenServer first. It was an easy installation on my workstation. After installation you have a curses interface to the OS.


But I didn't know how to create VMs. I searched for the documentation online (looks like good documentation) and I discovered that I have to use a client to access the VM manager to create VMs. So what I was losing in deciding to go with a hypervisor was the use of a desktop. The hypervisors are for servers or any headless system that will run the software. Losing the use of my new monitor wasn't part of the plan. So I decided to ditch XenServer.



I moved on to KVM. KVM stands for Kernel-based virtual machine. KVM was the chosen virtualization technology for Ubuntu. Xen technology was purchased by Citrix systems a while ago. Xen is still provided as open source but the enterprise product is requires a license. So I wondered why KVM for Ubuntu? The reason as provided by the KVM site is that the development effort was minimized because it utilized the Linux kernel. Why build a light-weight OS with scheduling of processes and workloads when there is an excellent open source OS available already? So what KVM does is it adds a module to the kernel to trap whenever the kernel encounters certain instructions. Oh yeah, KVM assumes you have virtualization capabilities provided by the CPU. So I tried it.



It took some time but I enjoyed playing with KVM. It's an interesting set-up. And the documentation is Ok. KVM by itself only allows you the capability of running VMs. But to run a VM you require some sort of emulator. So how this solution works is that you require either VMware player, Xen, or QEMU. In other words, KVM turns your Linux host into a hypervisor. The good thing is that you still have the desktop (GUI) where you can run VMs. The difficulty is that there are extra software layers involved that makes the solution less than simple. I.e. libvirt. Libvirt is a library which permits access to VMs. Libvirt isn't necessary of course. All it does is gives you a standard front-end to your VM solution. So KVM => Libvirt => emulator.

What's cool about Libvirt is that you manage the VMs via a shell console (virsh) or a gui console (virt-manager). Anyway, I enjoyed working through the solution.



I didn't settle on KVM (although I'd like to). It has alot going for it. Principally, it's open source and second it doesn't shield you from it's capabilities. The only drawback is it's complexity. It took a little time to get started. But the reason I went back to VMware workstation is the performance of QEMU. With non-Linux OSs QEMU just didn't perform very well. I installed Windows XP and Solaris 10 using QEMU and it was slow. Very slow. Plus the fact that converting a QEMU created VM (i.e. qcow2 format to vmdx format) didn't necesarily mean that I could take it to a VMware player and play it back. I could have tried KQEMU but it was another level of complexity that was eating up more of my time. So I'm back to using VMware.



Long story short, I would prefer to use KVM/libvirt/QEMU if QEMU performed better out-of-the-box. I would also prefer to have a document which described the architecture of the solution to ease new users into using these technologies. Finally, based on some of the documentation I researched, it opens up clustering and cloud computing which I'm also interested in exploring. So it has alot of interesting technologies going for it. But for my current purposes it's complicated to setup and maintain. I hoped to use both KVM and VMware at the same time but it's not possible as both KVM/VMware add a module to the kernel to run virtualization. So they are mutually exclusive.

Wednesday, May 13, 2009

Ubuntu Mirror

Working with Ubuntu exclusively on my workstation now since April 24th (Jaunty release). I've also been exploring virtualization technology as well. I gave XenServer, KVM a try. Settled on VMware only because the files were portable (so I can take it to work). I'll write about the different products another time.

My intent was to create a VM so that I can run some genealogy software. The VM for the obvious reason of backing it up easily. Funny thing is that after installing Ubuntu (a few times) it appeared to me that Ubuntu *required* a connection to the Internet to install. I found out after some experimentation that this isn't the case. Anyway, I wanted to set up a mirror so that I didn't need to do this. However, I didn't want to set up a mirror by downloading the entire Ubuntu repository (22+ GB).

So I experimented with apt-mirror and here are some of my results.

- I purchased the entire Ubuntu repository from a store called LinuxStore.ca (LinuxStore.org.uk).
- So instead of downloading the mirror I wanted to copy it from my set of DVDs
- I followed the instructions in setting up apt-mirror
- Instead of pointing mirror.list to a mirror like http://ca.archive.ubuntu.com/ubuntu I followed an instruction to add the line: deb file:/media/cdrom/ubuntu jaunty main
- This didn't work so I read the /usr/bin/apt-mirror perl script
- Problem with the script is that it strips "http://" from the mirror name and creates a folder under /var/spool/apt-mirror/skel/. Since my URL = file:/media/cdrom/ubuntu this just didn't work.
- So what I ended up doing was keeping a single entry in the mirror.list file and mirroring that (i.e. jaunty-security main - 66 MB).
- I then confirmed the file structure and copied the pattern for the rest of the files
- I then confirmed file structure of the locally (DVD) retrieved files by confirming that apt-mirror when executed would update the packages
- Therefore I know that copying universe and multiverse components (from the DVD) of the repository will work correctly

Quick summary

- If I could do this again I would not buy the entire repository. Downloading "main" and "restricted" is good enough.
- Creating a mirror isn't necessary. You can point any VM sources.list file to a local directory or a NFS mounted file system or a DVD mounted iso and apt-get will retrieve the files.

Ubuntu Install

/dists/jaunty/binary-i386/main/Release.gz contains the list of packages for the component (i.e. main)
/pool/main/a
/pool/main/b ... etc.
contains all the packages.
Running apt-mirror for a different component, eg. main, will check the most recently available packages from the repository and update your mirror. So this is the benefit of using apt-mirror.

Question: Is the archive of apt-mirror the same as an archive stored at /var/cache/apt/archive ?
When running Update Manager it saves the updated packages to /var/cache/apt/archive .

Sunday, May 10, 2009

Chinese computing

I've been learning Mandarin Chinese for 2 years now. I've been attending a Mandarin language school with Tzu Chi http://en.tzuchi.ca/canada/home.nsf/home/index . My teacher is 傅老師 (Teacher Fu) and we've been learning from a book named 五百字說華語 (Speak Mandarin in Five Hundred Words).

Problem: I recently installed Ubuntu Linux on my computer at home. I figured out how to install and use the Chinese input method offered in the Windows OS. I used the Windows New Phonetic 2002a code table for input because it allowed me to use pinyin for input and obtain traditional chinese characters for output. Ubuntu has a input method (IM) framework so that you can add multiple language input methods and create your own easily. It's called SCIM (Smart Common Input Method). Unfortunately it has multiple input methods but not one suited for me. It does have Pinyin but it produces simplified characters. Not being able to find the one I wanted I went online to read about all of them. I found that 五筆字型 (wu bi zi xing) was the coolest. It's difficult that's for sure but it's also efficient when you've acquired the skill and the memory for the character strokes. Since the system is built around how the characters are written with a brush it means that I kill 2 birds with one stroke (if you will). I can learn characters, character strokes and an input method all at once. And when I finally build up the repertoire of characters I won't be (necessarily) defficient in any particular area (that is the hope anyway).

It was a problem at first. Because 五筆字型 produces simplified characters. Further research indicated that there are more than one wubi method. wubi86 supported the GB86 character set. There was an update called GBK which included traditional characters as well. However, GBK wasn't very popular and it was because GBK wasn't backward compatible with wubi86. And so wubi 2000 provided support for GB 18030-2000 which was backward compatible with GB86 but also contained traditional characters. I was looking for a wubi code table which supported GB 18030-2000. I couldn't find it.

Lucky for me, after starting this entry for my blog I discovered a method for looking up input codes for various chinese characters (until I actually know some of these codes) and I discovered that the wubi input in Linux does have traditional characters contained in the table. So that the character 謝 (traditional) and 谢 (simplified) both have the same wubi code "ytmf". So I'm not limited to a particular character set after choosing a specific input method.

My next step is to begin cataloguing the characters I've learned from my chinese textbook and inputting them into a table for quicker lookup. I'm also contemplating how to produce a database to store the various methods of locating characters.
There are various systems available for looking up characters. Radicals, pinyin, ... I will probably begin cataloguing these methods to get a better picture of how to go about searching.

More later.