Advertisement
Promo

Databases Toolkit

Download now

A DOM Tree Alignment Model for Mining Parallel Data From the Web

PublisherMicrosoft
Format120.7KB PDFDate added26 May 2006
Topics Knowledge and Data Management, Parallel Processing, Data Mining - Analysis
Downloads101

This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree alignment model is proposed to identify the translationally equivalent texts and hyperlinks between two parallel DOM trees. By tracing the identified parallel hyperlinks, parallel web documents are recursively mined. Compared with previous mining schemes, the benchmarks show that this new mining scheme improves the mining coverage, reduces mining bandwidth, and enhances the quality of mined parallel sentences.

Download now

Did you find this white paper useful?
29 out of 50 users found this white paper useful


  • Trackback
  • Clip Link

Related white papers

The Evolution of Integration

Once upon a time life and information systems were simple. Then one day somebody let Pandora out of her box. Someone said -can't we add new requirements to these systems?...


The Role of Open Source Data Integration

This free-to-download whitepaper looks at how Enterprise customers are demanding project]sized data integration tools that can be scaled up to enterprise use. They donft want complex, expensive DI products that...


The Journey Along an Information-Led Transformation

A shift is underway from simple automation to business optimization, and information is at the center of it. Information, when aligned with your business strategy, holds the key to driving profitable...


The new information agenda:Do you have one?

The lack of trusted information — information that is accurate, timely and relevant— is on the minds of CEOs and senior executives around the world. a paradigm shift from siloed...


Best Practices for Translating Customer Satisfaction into Revenue

Today's support organisations are focused on two top-level metrics: financial results and customer satisfaction. For most, it's easy to track financial performance, but customer satisfaction is akin to speaking a...


Support Strategies: Customer Experience Management

Customer experience is the most powerful tool available today for distinguishing your company from competitors ? each contact with the customer offers an opportunity for strengthening your relationships by delivering...


3 Strategies for Reducing IT Support Costs

As companies brace for more bumps in the economic downturn, many organisations are indiscriminately cutting costs. To ensure a seamless transition into the post-recession market, however, slashing and burning is...


Broadband Deals? Powered by Top 10 Broadband

150+ broadband packages

Compare 30+ mobile broadband deals

Mobile Broadband »
White Paper

Featured White Paper

IBM Virtualization Services

Virtualization is a powerful technology and can have profound effects on the datacenter; however, it should be viewed as a component of an overall IT strategy that will be able to support the enterprise's needs. IDC recommends that enterprises look at the entire architecture and determine how to best deploy virtualization

Download Now

Other White Papers

HP print solutions and 3M

the objective for 3M was to optimize office printing infrastructure at 3M locations worldwide...

IBM XIV® Storage System: Thin Provisioning Reinvented

Managing IT storage infrastructure is an endless balancing act of providing enterprise-class...

See All White Papers


Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters