Some people will take issue with my premise. They argue that the Semantic Web is a "non-starter" - an idea that came from deep thinkers that never moved into the mainstream. And many people hate the Web 3.0 terminology. I am not trying to entertain either debate, but merely to present practical ways that we all can move the web forward.
It seems to me that there are three ways to help unlock the massive amount of data that exists on the internet. One method is to build semantics into every significant web page on the internet. Using technologies like RDF and Microformats, website editors can encode their data to include hints as to what the meaning of the data is. Efforts are underway in this direction, but it's a massive undertaking, and can really only be done by website owners, one-by-one, modifying each page (with some obvious automations).
Second, there's the API approach. Application Programming Interfaces are tools that allow programmers to easily access data without having to deal with the typical presentation information that's on a webpage, such as styles, layout, and things that help humans.
Best Buy recently released their API for access to their trove of product and store information. If each data owner releases an API, then the data can be unlocked, and meaning can be derived from the data.
Finally, there's the page-scraping approach. Currently Search Engines do this - they try to figure out what data "means" by examining the page as is. In addition, individual developers write little page scraping snippets that go get the data that they want. Python is an exceptional language for this task (reportedly, much of Google's early code was written in Python, and more recently, the creator of Python is employed by Google).
Let me show you what each of these techniques looks like, before I present my practical recommendation:
Method 1: Embedding Semantic information into the web page
Embedded semantic information does not change how websites appear to humans. However, embedded into the source of the web page would be little clues as to the meaning of the data.
This is already taking place in some places around the web. In fact, Blogger (and other blog software) will do this for you automatically. Below, you can see this blog, and the semantic information that is embedded behind the scenes. (Click on the image to get a clearer picture.)

This is more than just HTML tags. Specific classes are used which describe the type of data that follows. So the Blog Post's Title is surrounded by tags that say "hey search engines, this next part is the blog post title". The HTML tags of the 1990's simply described the page layout, but these tags describe the page semantics - what the page means.

The challenge with this method is to get everyone all on the same page! If everyone uses the same class name designations for the same purposes, then this system has a great chance of success, and is already showing signs of major progress. Organizations have sprung up to describe formats (like MicroFormats), so that class names are consistent across the web. If they aren't consistent, then we are no better off after semantic encoding than before.
Method 2: Application Programming Interfaces (APIs)
APIs can be any sort of programmatic interface to an application, but lately a certain style has emerged: URL-based (RESTful) API's. With this technique, website and data owners write a second set of web pages that are really not meant for humans. Instead, they are meant for other programs to access the data.
Below is an example of the Best Buy Remix API which was released a few weeks ago. Best Buy has made it possible for developers around the world to access their store and product data.

Some corporations and individuals view this as a risk. After all, if I expose all my pricing to all my competitors, what's stopping them from offering everything for one penny less, and beating me on every deal? Or what about ad revenue from my page?
But I think that Remix is actually genius on the part of Best Buy. After all, the semantic web WILL happen. You have a choice of being first, and defining the standard (forcing your competitors to have to chase you), or trying to protect the old way of doing things, and end up being in a follower position.
This reminds me a little of an article I just read in this month's Wired Magazine, related to how the "Netbook" caught the traditional laptop manufacturers off guard. The key learning is that if you are too worried about protecting your old business, you might lose sight of a huge new business opportunity, and then forever be chasing the new model. No, Best Buy was right on - release the API, define the standard.
Here's what I think is going to happen in the API field: as developers begin to write code for the Best Buy API, it will bring value to Best Buy. And then Wal-Mart and Target and others will be scrambling to catch up. Some companies may take the typical "Sony tactic" and try to invent their own standard. But by then it's too late. Others will copy Best Buy's API definition (be "Remix-compatible"), and have immediate compatibility to developers' code that is already being written. But neither of these positions is as desirable as Best Buy's position, of getting to define the standard!
API's are a great way to move forward toward the next evolution of the web, as I mentioned in my earlier blog post.
Method 3: Page Scraping
The concept of Page Scraping seems ugly and primitive. This has been going on for years, but both tools and knowledge are progressing so that this method has a realistic practical position in the future of the web.
This method starts with the user - some user who wants some information. In my example below, an AFL (Australian Football League) fan wanted programmatic access to scores that were available on the web.
This user simply navigated to the website that contains the information that he or she was looking for. And then a browser "view source" of the web page provides the format so that the page can be analyzed and the scraper can be built.

The user then wrote a small snippet of code that was capable of snagging that information, and feeding it to his program.

Finally, he publishes his code, and tells the world. And now the world is just a tiny bit better off than it was before.

Take a close look here, and think about what's going on. Is one little application that grabs AFL scores significant? It doesn't seem like it. I'm from Ohio; what do I care about Australian Football?
More significantly, though, is the fact that the world was made a little better by one person who had a need, and solved the problem on his own, and then provided it back to the community.
Does that model look at all familiar?

So to summarize, there seem to be three ways to advance the web, and provide meaning to the vast amounts of data that is out there. Two of them are top-down, and one is bottom-up.
Two are elegant and sturdy. One is ugly and fragile.


If the third is ugly and fragile, then why even talk about it?
I believe the answer lies in history.

Think about the landscape of the Encyclopedia market ten years ago. Surely, Brittanica knew that people wanted online access to an Encyclopedia, but it would also probably kill their print business. They likely recognized that their print business would have inevitable pressures anyway. Sound vaguely like the predicament that Best Buy and others are in? Lead change, or forever be chasing it.
Nupedia popped up, with the idea of creating an online encyclopedia, and started hand-crafting well researched articles, page by page. Never heard of Nupedia? Well, it's not easy for a small group of people to create an encyclopedia from scratch. To me, this sound vaguely like the efforts to create the Semantic web by recreating every web page and adding in Semantics. These are hard problems. And Nupedia didn't get too far, with it's model of hand-crafting a free, well-researched, edited encyclopedia.
So in January 2001, Nupedia started a side project to allow collaboration on articles. It opened the process to the world, starting with less than twenty articles, and named it Wikipedia. As encyclopedias go, it was ugly and fragile. Fifth graders could change facts, or write that "Dean Wormer Sucks" right into the article for Thomas Jefferson. But bit-by-bit, contribution after contribution, the world got to be a better place. And now, eight short year later, Wikipedia is recognized as a tremendous wealth of knowledge.
Compare the charts below, describing the Approaches to an Online Encyclopedia, to the ones above, regarding Approaches to the Next Generation of the Web:


To me it seems obvious and inevitable. These two situations map exactly.
Someone will invent a free-forever, open source, contributory mechanism to release the information that is hidden in the web. And bit-by-bit, contribution-by-contribution, the world will become a better place.
And that's what the Amy Iris project is all about. We're about ready to enter a Beta Test of our attempt at a free-forever, open source, contributory framework for pulling information out of the vast data of the internet. If you are interested in participating and seeing what we think could unleash the power of the internet, please drop me an email. I'm at amyiris at amyiris dot com.
As always, I'm open to feedback!
Smile! You're gettin' it!




