February 13, 2007 at 01:06 PM | categories:
apache, unix, http
| View Comments
1 342 625 990 / 105 314 712 = 12.748703
Yes, log files contain a lot of redundant information. I enjoy seeing over 12X compression on a file!
October 15, 2006 at 03:15 AM | categories:
programming, http
| View Comments
I've been fixing some on my web server again, for fun. Last weekend, I refactored a lot of code, added dynamic mime typing and CGI support! I was going to continue fleshing it out today, but I had to do even more refactoring to clean up some messy code paths.
I broke the Http:sendFile() method into several new ones and moved HTTP/1.1 keep-alive handling into a more central location instead of just tacked on to the first place that it worked. One thing that is driving me slightly insane, is that there appears to be a tiny memory leak. I can't figure out where it is happening since I eliminated almost all dynamic allocations in the stack. As I wrote that sentence, I think I figured out where it could be coming from, but I'll need to rewrite some more code to fix it. It is so small, you don't start to notice it until you get at least 1,000 requests.
Overall, I'm happy with the design so far. This is my first C++ program and my first serious "server" program. I didn't do any real upfront design except for drawing on past experience and my gut. I've added a number of features and it has been extendable. OOP purists would probably frown on it, but I'm using the subset of C++ and OOP in general that makes sense to me and is practical for what I'm doing. Is there a cleaner way? Likely *shrug*.
I'm getting to the point where a couple of patterns probably make sense. This program is growing organically, but under a tight enough constraint that it isn't turning into a mess (at least not yet). The other thing that starts making sense is Unit Testing. I've been doing more refactoring than I have been adding new features and it would be awesome to be able to run a test suite and know that I haven't broken anything. I'm not even really sure where to begin on that, but it is obvious that Shelob is becoming more of a "real" program and less of a toy.
A couple more good weekends and it would actually be semi-useful.
May 27, 2006 at 11:58 AM | categories:
security, code, http, internet
| View Comments
I fixed up the parsing issues on Shelob so that it is somewhat respectable, instead of a bunch of hacks. It was obvious once I started looking at what the client was sending me (the LiveHTTP headers Firefox extension rocks), that I needed to break up each line and then seperate the values into a name and value.
After rewriting the getHeaders() function to use STL hash tables, not only is the code more flexible, but it is also cleaner. For example:
[code]
log.writeLogLine(inet_ntoa(sock->client.sin_addr), request_line, 200, size, headermap["Referer"], headermap["User-Agent"]);
[/code]
Here, with the headermap, it is obvious what values I am passing. Before the rewrite, I just had a bunch of tokens[3], tokens[5], etc.
I'm also toying around with the idea of privilege seperation and chroot jails. This sort of flows with the previous post of a micro-kernel type approach, similar to how Postfix works. While it is more secure, the programming challenges are pretty high. I may leave that for a later version. I still have a bit of cleanup to do before a release.
Aside:
Theo de Raat gave a nice presentation on exploit mitigation techniques that OpenBSD is using which relates to some of these ideas.
May 25, 2006 at 04:23 PM | categories:
code, http, internet
| View Comments
I fussed around more with logging today, which lead me to the parseHeader() function. Parsing is one of the weakest areas right now. For simplicity, I had implemented it by simply tokenizing on "space", shoving the tokens into a string vector and then iterating over that vector for the tokens I needed.
So far, I've not peeked at anyone elses source code, Shelob is a clean room implementation of a basic HTTP server. However, I really need to clean up the parser. I thought about going with a full lexer using flex or something, but that is probably overkill. Plus, I'd rather not add another dependency. More thought on this is needed and maybe some research into how other people are doing this. Very much an area where security can go wrong, it needs to be done right.
The other thought I had while poking around, is that I could make each component into its own server, sort of a mini-microkernel approach. I could imagine a swarm of different servers, all being able to communicate. You could have the log server running on one host, seperate cgi servers for each user, as well as different backends. The only thing I'm not sure about is how much overhead this would be. A lot of the interprocess communication could happen over local UNIX sockets, FIFOS, or even shared memory, but it would be awesome if it all worked fast over a regular socket. Yet more thought needed here.
So far I'm having a blast playing with this program. It is nice to write something for yourself and make only the trade offs you decide. I don't have any customer or management trying to shoe horn this thing into something I don't want. Even if I never release it, it is a good brain excercise.
May 24, 2006 at 10:22 PM | categories:
code, http
| View Comments
Today I added support for NCSA/Apache style logs. It has been nearly 2 years since I last touched this code and closer to 3 since I first wrote it. Surprisingly, I'm able to make modifications pretty easily. To me, this indicates that the design is semi-clean. The odd thing about Shelob is that it is literally my first C++ program. I've never so much as done a Hello World in C++ before writing a web server. Granted, I had done a fair amount of C before this and I'm using C++ more for the STL and namespaces.
It isn't completely OOP, but C++ isn't either. One of the big things that I was trying to do with Shelob was to use C++ strings exclusively, but I found out quickly that it is almost impossible not to drop down and use C style "strings" at some point, espeically when dealing with sockets. Right now Shelob is very incomplete, but it does have the following features:
- Compiles cleanly on Solaris/Sparc, OpenBSD/PPC, OSX/PPC, Linux/x86
- Binary is less than 60K
- Supports HTTP/1.1 Keep-Alive
- Basic log file support
- A filter class (currently supports adding a footer to every HTML page before serving)
Currently, it is forking, but I'm considering moving to a select model for speed. I would also like to be able to run it from Win32, but that is a much lower priority. It would be nice if Vista supported forking. I have some ideas for future features, but there are some areas that are a little rough in the current code that need refactoring. I also need to ponder what license to release under. I'm leaning towards BSD, but GPL is running a close second. I should probably look at other web servers and see what they are operating under.