Automatic selection of print-worthy content for enhanced web page printing experience

Тип публикацииProceedings Article
Дата публикации2010-09-21
Краткое описание
The user experience of printing web pages has not been very good. Web pages typically contain contents that are not print-worthy or informative such as side bars, footers, headers, advertisements, and auxiliary information for further browsing. Since the inclusion of such contents degrades the web printing experience, we have developed a tool that first selects the main part of the web page automatically and then allows users to make adjustments. In this paper, we describe the algorithm for selecting the main content automatically during the first pass. The web page is first segmented into several coherent areas or blocks using our web page segmentation method that clusters content based on the affinity values between basic elements. The relative importance values for the segmented blocks are computed using various features and the main content is extracted based on the constraint of one DOM (Document Object Model) sub-tree and high important scores. We evaluated our algorithm on 65 web pages and computed the accuracy based on area of overlap between the ground truth and the extracted result of the algorithm.
Для доступа к списку цитирований публикации необходимо авторизоваться.

Топ-30

Журналы

1
IEEE Computational Intelligence Magazine
1 публикация, 14.29%
Studies in Computational Intelligence
1 публикация, 14.29%
1

Издатели

1
Institute of Electrical and Electronics Engineers (IEEE)
1 публикация, 14.29%
Springer Nature
1 публикация, 14.29%
1
  • Мы не учитываем публикации, у которых нет DOI.
  • Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
7
Поделиться
Ошибка в публикации?