Mining the Tagged Web: "Several years ago, researchers at the IBM Almaden Research Center in San Jose, Calif., began an effort to study the Web as a mathematical grapha collection of nodes (representing Web pages) and lines (representing hyperlinks). They were interested in studying various properties of this graph, including its diameter and connectedness, to obtain insights into algorithms for crawling and searching the Web and to characterize the Web's sociological evolution.
"To obtain data, the researchers conducted Web crawls that encompassed 200 million pages and 1.5 billion hyperlinks. They confirmed that the distribution of pages and link number follows a simple mathematical relationship known as a power law. In essence, most pages incorporate just a few outgoing links, whereas a few pages have a huge number."
"'In a sense, the Web is much like a complicated organism, in which the local structure on a microscopic scale looks very regular (like a biological cell), but the global structure exhibits interesting morphological structures (body and limbs) that are not obviously evident in the local structure,' Ravi Kumar of IBM and his coworkers concluded in a paper presented in 2000 at the Ninth World Wide Web Conference.
"The effort to amass data about the structure and content of the rapidly growing Web didn't end there. It continued and now encompasses about half of the Web and includes much "informal" communication, such as Web logs, newsgroups, and chat rooms. The resulting panoply of data has become the basis of an ambitious commercial service that IBM recently launched called WebFountain."
"Both Google and WebFountain stemmed from academic research about text mining and the insight that the best way to find information is to focus on the biggest and most popular sites and Web pages. WebFountain goes one step further in trying to make sense of the pages themselves by tagging the information in a clear, consistent way. Any data miner that comes along now has a vast playing field on which to test its skill and prove its value."
Friday, March 5, 2004
Subscribe to:
Post Comments (Atom)
Referral Link
Have you looked at mobile phone service carrier Tello?
- Great affordable plans (like $10/month for unlimited talk/text, 1 GB of data)
- useful app for making calls if out of range
- start with $10 free
Disclosure
Blog Archive
-
▼
2004
(321)
-
▼
March
(59)
- Doctors attack law as banning most second-term abo...
- Cellphones offer way to track the kids: "A cellpho...
- Costco's love of labor: Employees' well-being key ...
- Flextime: Not a bad stretch: "Working 9 to 5, what...
- A tough loss for left in abortion war: "These days...
- Reverse mentoring: When juniors know more than the...
- NASA's experimental plane breaks world speed recor...
- Tell the kids to go fly a kiteon the National Mal...
- Sky watchers could not 'planet' any better: "An af...
- In Pledge Case, Passing the Test (washingtonpost.c...
- New speech technologies making noise: "A key stand...
- Copyrights and Wrongs: Damming the Flow of 'Free' ...
- 'Good death' a topic that is alive and kicking: "E...
- Love Affair With Google: "The prospects of a huge ...
- The Next Frontiers: Google: "Sergey Brin, the cofo...
- WTOPNEWS.com: "A massive celebration on the Nation...
- U.S. Delta 2 Rocket Launches 50th GPS Satellite: "...
- Tax Refunds: some are spending, others are gettin...
- Model Hacker Behavior: "Forget about patches. Rese...
- The other side of D.C.: "Beyond the monuments and ...
- Europe Considers Harsh Piracy Law: "The European P...
- Google chief forecasts future of search: "Personal...
- Best of the Best in Washington, D.C. for the Famil...
- Gov't Pushing for Research on Robotics: "The succe...
- In Searching We Trust: "'In one sense, with Google...
- For family, space center is home: "The KSC work fo...
- SEO 101Link Popularity and Link Building Good adv...
- How Google Is Revolutionizing the Ad Game: "The di...
- Google Rolls Out Local Search System: "Search engi...
- Scientists try to take 'BOOM' out of supersonic fl...
- NASA, military to work closely on space effort: "N...
- Webcasters to Report and Pay: "In a move welcomed ...
- Christian radio, NPR battle for signals: "NPR and ...
- Witnesses Ask US Senate for Research into Side Eff...
- GOP Lawmakers Ask Ginsburg to Withdraw from Aborti...
- INDIA: Retail magnet?: "Indian workers have been g...
- IT lawyers create cyberlaw site: "IT law firm Buys...
- Open-Source Fight Flares At Pentagon: "Microsoft C...
- How to Make Your Own Eye-Popping 3-D Pictures: "Si...
- NASA Creates Portrait Of Life And Death In The Uni...
- US Sets Up Panel to Prevent Biotech Abuse: "Expert...
- Cherry Blossom Forecast Announced: "Washington's v...
- Students build satellites: "Arizona State Universi...
- Warning: Blogs Can Be Infectious: "The most-read w...
- Ham Radio Connects Michigan, Ohio Youngsters with ...
- NASA faces rush of retirees: "A pipeline once fill...
- To the ramparts, gadget brigade: It's Format War: ...
- Mining the Tagged Web: "Several years ago, researc...
- Bears, Redskins are on vastly different paths: "Al...
- Kofi Annan Calls Abortion Group Shining Example ...
- Bush Advances Gay Rights: "President Bush's endors...
- Cell phone firms struggle to catch up with Verizon...
- Sears CIO Gerald F. Kelly, Jr.: Transforming I.T.:...
- Vericenter Hosting NASA Mars Content: "Managed hos...
- Mars Rover Finds Signs of Ancient Water: "NASA sci...
- 12 Best Walking Cities in the U.S.: Northeast: Jer...
- Geospatial One-Stop Portal Expands: "The Geospatia...
- Verizon Wireless: in-flight calling with your numb...
- Netting extra profits: "Way Station Books & Stuff ...
-
▼
March
(59)
No comments:
Post a Comment