ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Databases Toolkit

Download now

A Pitfall and Solution in Multi-Class Feature Selection for Text Classification

Did you find this white paper useful?
12 out of 23 users found this white paper useful


Publisher Hewlett-Packard
Publisher Registration N/A
Topics Knowledge and Data Management Date added 08 Jul 2004
Downloads 1 Format 318.4KB PDF

Information Gain is a well-known and empirically proven method for high-dimensional feature selection. The author found that it and other existing methods failed to produce good results on an industrial text classification problem. On investigating the root cause, the author found that a large class of feature scoring methods suffers a pitfall: they can be blinded by a surplus of strongly predictive features for some classes, while largely ignoring features needed to discriminate difficult classes. This paper demonstrates this pitfall hurts performance even for a relatively uniform text classification task. Based on this understanding, it presents solutions inspired by round-robin scheduling that avoid this pitfall, without resorting to costly wrapper methods.

Download Now

Did you find this white paper useful?
12 out of 23 users found this white paper useful


  • Trackback
  • Clip Link

Related white papers

Outbound Email and Data Loss Prevention in Today's Enterprise, 2008

How concerned are companies about the content of email leaving their organizations? And how do companies manage the legal and financial risks associated with outbound email? To find out, Proofpoint...


PEM Division of BHEL Improves Team Collaboration and Customer Satisfaction

Bharat Heavy Electricals Limited (BHEL), one of the largest manufacturing and engineering companies in India, has over 180 products in the energy and energy infrastructure sectors. The Project Engineering Management...


SQL Server 2005 Administration

Learn to administer and tune SQL Databases and Servers. Description: Get the knowledge and skills you need to maintain a Microsoft SQL Server 2005 database in this 5-day course. Learn how to...


Microsoft Exchange Server 2007

Learn about Exchange 2007 and how to leverage its features. Description: Exchange Server 2007 is the latest version of Microsoft's premier messaging application. In this hands-on course, you'll learn to install and...


Powerful Data Warehousing Performance With IBM Red Brick Warehouse

A data warehouse typically provides business intelligence value through an Relational DataBase Management System (RDBMS) optimized for OnLine Analytical Processing (OLAP), which deals with extracting and viewing data from different...


Metrics Guide for Knowledge Management Initiatives

The Department of Navy (DON) Chief Information Officer (CIO) has led the development of an Information Management/Information Technology (IM/IT) Strategic Plan to build a knowledge sharing culture and exploit new...


Oracle Streams - Replication Tips and Techniques

Oracle Streams is a uniquely flexible feature for information sharing within Database 10g. The three basic elements of Oracle Streams, capture, staging, and consumption, can be configured in a number...


Featured White paper

A Blueprint for Better Management from the Desktop to the Data Center

In the new service-oriented world, virtualization is critical. However, with virtualization comes a new set of management challenges. The introduction of virtual machine operating system “images” as a first-class IT asset necessitates OS image lifecycle management—for instantiation, usage and retirement.

Download Now

Other White Papers

IDC reports on Novell's Secure Desktop Solution: A Modern-Day Marriage of Business Benefit and Risk Reduction

The increasing mobility of the modern workforce and the competitive requirement to optimise that...

IDC Executive Brief: The Rising Concerns Over Endpoint Security

Today's IT environment is increasingly vulnerable to threats and attacks, both from within and...

See All White Papers