CrAKeN Posted May 1, 2017 Share Posted May 1, 2017 With its Fathom JavaScript framework, Mozilla wants to extract meaning out of web pages and produce a more intelligent browser. Positioned as a "mini language" for writing semantic extractors, Fathom already is in production with Firefox's Activity Stream web traffic tracker, picking out page descriptions, images, and other items, said Mozilla's Erik Rose. Still in an early stage of development, Fathom "enables Firefox to understand the structure and content of a web page," he said. The framework could be implemented in browsers, browser extensions, and server-side software. Rose presented scenarios in which Firefox could understand pages the same as a person. For example, the browser could recognize and follow a log-in link, provide hotkeys to dismiss popovers, hide superfluous navigation or header sections on small screens, and determine what to print without needing print stylesheets. These scenarios, he said, assume the browser can identify meaningful parts on a page. Echoing the much-touted semantic web, Rose cited previous attempts in this vein, such as semantic tags, Resource Description Framework, and microformats. Fathom, meanwhile, is a data-flow language like Prolog. It extracts meaning from web pages, identifying parts like address forms, Previous/Next buttons, and the main textual content. DOM nodes are scored and extracted based on user-specified conditions, and a system of types and annotations expresses dependencies between scoring steps and controls state. Existing sets of scoring rules can be extended without having to directly edit them, so third-party refinements can be mixed in. Fathom's rule sets are data that look like JavaScript function calls, but the calls are making annotations in a version of a syntax tree. "Today, that gets us automatic tuning of score constants," Rose said. "Tomorrow, it could get us automatic generation of rules themselves." Source Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.