2012 ©
             Publication
Journal Publication
Research Title Bottom-up region extractor for semi-structured web pages  
Date of Distribution 31 July 2014 
Conference
     Title of the Conference 18th Computer Science and Engineering Conference (ICSEC2014)  
     Organiser Computer Science, Faculty of Science, Khon Kaen University 
     Conference Place HOTEL PULLMAN KHON KAEN RAJA ORCHID  
     Province/State Khon Kaen 
     Conference Date 30 July 2014 
     To 1 August 2014 
Proceeding Paper
     Volume 2014 
     Issue
     Page 284 - 289 
     Editors/edition/publisher IEEE 
     Abstract Generally, the database websites have provided the interfaces for giving users access their structured data. These data are usually represented in a form of data records in a coherent region of a result page. However, the page usually contains not only the data region, but also other extraneous ones. Therefore, the important tasks for extracting data records from these semi-structured web pages are identifying the relevant data regions and ignoring the irrelevant regions. To figure out the stated problem, This paper proposes a region extractor to be a preprocessor tool for helping an information extractor to locate and extract the relevant data records from web pages. Most existing works analyze the DOM tree of an input page in a top-down manner. In another way, the proposed method traverses the DOM tree in the bottom-up direction that the similarity of the leaf nodes are analyzed prior to find a set of data items. After that, their parent nodes are recursively analyzed for identifying data records and data regions respectively. The proposed method is completely unsupervised and maintenance-free wrapper. For performance evaluation, it is empirically tested on 15 real-world websites. Experiments show that the proposed method achieves 94.37% accuracy of data record extraction and outperforms the well-known top-down method, DEPTA (55.39%).  
Author
537020029-1 Mr. WACHIRAWUT THAMVISET [Main Author]
Science Doctoral Degree

Peer Review Status มีผู้ประเมินอิสระ 
Level of Conference นานาชาติ 
Type of Proceeding Full paper 
Type of Presentation Oral 
Part of thesis true 
Presentation awarding false 
Attach file
Citation 0