| Publisher | Association for Computing Machinery | ||
|---|---|---|---|
| Format | 136.6KB PDF | Date added | 08 May 2007 |
| Topics | Spam - E-mail Fraud - Phishing | ||
| Downloads | 23 | ||
This paper studies the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. The contributions are two fold. First, the paper find that the method of dataset construction is crucial for accurate spam classification and it notes that this problem occurs generally in learning problems and can be hard to detect. In particular, the paper find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In this case, classification performance can differ by as much as 40% in precision when using non-domain-separated data. Second, the paper shows rank-time features can improve the performance of a web spam classifier.
Related white papers
Combating Fraud and Improving Claims Processes in SMB Insurance
IBM understands the increasing threats facing insurance companies and offers proven solutions to capitalize on a variety of risks. This executive kit contains articles related to claims management, insurance fraud...
IBM Healthcare Security Executive Kit
Healthcare organizations must protect an expanding amount of patient information from internal and external threats, while ensuring 24x7 availability and secure, immediate access to critical patient data by authorized users....
Risk, compliance and security: Can your financial institution weather the storm?
Learn how preemptive security can help stop Internet threats before they affect the network. IBM provides a variety of smart solutions tailored specifically for mid-sized financial institutions. Start with a...
Small Business Webcast - Spam Protection for Small Businesses
In June 2004, 65% of all Internet email was identified as spam. As spam continues to proliferate, small businesses have been forced to protect themselves from lost employee productivity, unnecessary,...
Staying Ahead of Spammers With Symantec Solutions
During the period January - June 2005, spam made up over 61% of all email traffic, a slight increase over the second half of 2004. As spam continues to proliferate,...
Indiana School District Cuts Costs with iPrism Web Filtering Solution
When Indiana's Greenwood School District sought a reliable and cost-effective way to set up Web filtering parameters that would meet the diverse needs of students, teachers, and staff, it selected...
Securing SMBs Against Spam and Virus Threats
This white paper from St. Bernard Software explains why spam and viruses are particularly tough to eliminate in small- and medium-sized businesses (SMBs) that can't dedicate IT staff to combating...

