Improving data extraction system to parse data from scraped job advertisements

Extracting the information from an online job advertisement might be a little tricky. The information is wrapped with redundant information, called boilerplate, that is not related to the job at all. The information also needs to be segmented and classified into the right class or groups. After the information has been classified, it is easier to find the features (e.g., required skills and required education) that make the later processing faster.

Creators CLAUDIA NATHASIA JASON Contributors Henry Novianus Palit (Advisor and Examination Committee) Publisher Universitas Kristen Petra Language English Theme Digital Theses Category Undergraduate Thesis Sub category Skripsi/Undergraduate Thesis Source Undergraduate Thesis No. 02021918/INF/2019; Claudia Nathasia Jason (26415134) Subjects DATA MANAGEMENT

Files

Coverpdf1.1 MB
Abstract TOCpdf627 KB
Chapter 1pdf116 KB
Chapter 2pdf215 KB
Chapter 3pdf250 KB
Chapter 4pdf672 KB
Chapter 5pdf272 KB
Conclusionpdf119 KB
Referencespdf372 KB
Appendicespdf2.8 MB