Strapper PHP Library

Strapper is an invention born out of wanting to never write open bracket, d-i-v, close bracket, space, space, open bracket, forward slash, d-i-v, close bracket… EVER AGAIN. Never, ever, ever again. $(#& HTML that’s our philosophy, and for you front-end developers out there… learn PHP or go find another job. I know, I know, that’s a bold and possibly delusional  statement. Nonetheless, let me introduce Strapper, the PHP library that writes HTML/CSS for you so you don’t have too.

Strapper is made up of a tree and branches. The tree is a rendering object that contains branches, which contain more branches. In other words it’s a tree. When we call the tree render() method, it loops through all it’s branches and outputs the HTML from each branch. The branches are each objects, and they all extend the base class Strap_RenderBranch.

The core concept of Strapper is to build up (grow) branches from their most basic HTML elements such as a DIV. When we have a DIV we can make a Bootstrap ROW. When we have an IMG branch, we can put an image into a Bootstrap component. Eventually after we build up enough we can have a site section like a header or footer. And all the while we’re using the same simple branching system.

 

Learn PHP return, Learn PHP series

In this lesson we’ll learn PHP return, the return statement is commonly used in functions/methods to return a value. You’ll notice many methods have a “return”. And in documentation for methods/functions most list the return value(s). This is a critically important part of learning an API, is being aware of what is being returned. Good documentation helps us a lot by saving us the time of trial and error to find out what a method returns.

To understand what “return” means picture you have an object House stored in $house. Now let’s say there is a method in the House class whose job is to go and get the latest price for this house and it’s name is “get_price()”.  The get_price() method might be loading the price from the database where the house is stored, or it might already have the value of the house price stored in a class property such as $price. Now because get_price() returns the value of the price such we will expect a float (numeric) value such as 145900.

Most commonly we will want to store the return value of a method. In this example we store the price in variable $price.

$price = $house->get_price();

What if we are in a view or template file and want to output the price? We could use:

print $house->get_price();

Now let’s look at the get_price() method:

public function get_price() {

return $this->price;

}

How to Make Web Crawler Robot

We all know Google’s early success came from it’s search algorithm, and that’s what we still talk about most today. Behind the scenes though, is GoogleBot scanning the internet indexing URL’s at a furious pace. For search engines the ability to crawl the web, following links from one site to the next is at the core of what they do. Why would the rest of us need a crawler? Well let’s look at BuiltWith.com as an example of using a web crawler to extract valuable research data. With fees ranging from $500 to several thousand, BuiltWith.com has made a lucrative business out of selling access to data that it acquires through web crawling. At least we can presume they use web crawling, doubt they are running a telemarketing boiler room and calling website owners all day asking about what CMS and plugins they are using. And that’s the point, is without a web crawling getting largescale data from thousands or millions of websites is nearly impossible.

It Web Crawling Ethical?

There seems to be some debate about this. Generally speaking the ethics of web crawling is entirely dependent on how it is done, it depends on the context. Are you crawling public sites, are you abiding by robots.txt directives? Robots files are designed specifically to provide a mechanism for developers and site owners to control how robots behave. GoogleBot and other major crawlers will usually follow whatever the standards are, given the directions you setup in your robots files. You can easily block GoogleBot from indexing your site for instance.

More Importantly to Some, is Web Crawling Powerful?

Web crawling robots search/find/compile data, in a world where data can, and does translate into dollars. So yes, web crawling is a way on some level to create money automatically. Obviously there is an investment, significant perhaps into both building and operating a web crawler.ork 24/7/365 with as much accuracy and speed as you can build into it. Which leads to the next question…

How on Earth do you Build a Web Crawler Robot?

The simple answer is you don’t because there are so many already existing that there is actually a top 50 web crawlers page over at Big Data Made Simple. First thing I noticed about the list was holy Java beans, that’s a lot of Java apps and a healthy portion of C/C++ to go with it. Useless to me personally as a PHP dev presuming I want to be able to hack and customize the deployment which probably I do. That makes PHPCrawler the obvious choice. Another PHP option is OpenWebSpider. With OpenWebSpider I found the documentation looked a bit sparse, it seems cool for indexing pages like a search engine but no sign of how we might customize the indexing to store custom data.

And then Along Comes a Sphider

I felt Sphider is deserving of it’s own header because let’s face it when you think of crawling the web, secretly stashing away data for your own purposes there is a certain machiavellian air to it. And none of the other crawlers captures that better than Sphider. Designed primarily for search indexing, by default it seems Sphider won’t jump from one domain to another. That can changed in it’s options. I like that it can be run from command line as well.

Web Crawling so Easy a 16-Year Old Can Do It?

One of the few code and PHP only examples of web crawling I could find was How to Create a Simple Web Crawler on Subin’s Blog. Congrats to him for that and Simple HTML Dom library he’s using there is actually really a good choice for scraping. Used it before myself for both scraping and data importing/manipulation. The novel idea is scraping out the URL’s, getting them into the format you need and then following those URL’s. As suggested in the article this can be very resource intensive so you’ll need to put some limits on depth or how many URL’s you cover in a given crawl.

Designing the Data Scrape

There wouldn’t be much point crawling unless we scraped data and stored it away. The question is how and what? Well both PHPCrawl and Simple HTML Dom make the job fairly easy. The latter uses a parsing approach similar to jQuery where you can transverse the dom. I’m not to familiar with PHPCrawl but reading through the docs it seems to mainly just focus on getting the URL data. Perhaps using them together would work best. If you know what information you want, for instance we know we want to test what framework or CMS the site uses. One approach is we find try to load /wp-admin, if it’s there safe to say it’s a WP site, if it’s not that usually means it isn’t. Some exceptions apply. For Drupal there are a few other ways to test, version text file, admin etc. Other systems leave other kinds of traces. What we really want to know is if the site is WP, and if so what plugins and theme does it have installed.

Crawling Right Along Then

More to come later on how this project unfolds. What this initial research session showed is that it is possible, and even fairly well supported (especially for Java/C programmers) do build a web crawler robot using existing libraries. Not as much information about the design or approach as I would have liked out there right now.

Loving Phalcon

Just discovered Phalcon and really loving it. I hesitated at first to try it because of the need to install Phalcon as a PHP extension. But that process took me under 30-minutes, and well worth it. I’d been trying to craft up a fully custom PHP application for our ProblemPath.com domain when I realized at the very least some help with routing would be nice. My first thought was CodeIgniter, the tried and trusted name in MVC. And the one I last used, it’s been awhile in the CMS world for me so I haven’t experienced Laravel, Symfony etc. aside from using some Symfony components in the C5 CMS.

What I’m liking about Phalcon is the simplicity of the models, the flexibility in structure. Phalcon does not impose a directory structure, you can load up a fairly standard set of folders for m/v/c but you can also vary away from it. Your main bootstrap file defines these, so you could for instance have multiple model directories perhaps because you wanted to use a kind of modular approach.

I’m finding the Phalcon docs are quite good. The writing is clear, to the point. The examples work. There is an example app, Invo which I’m trying now and it’s on GitHub at https://github.com/phalcon/invo

To anybody shopping around for a framework I’d say Phalcon is worth a look. Especially if you’re really into speed, it’s claim to fame is that as a C compiled PHP extension it’s the fastest PHP framework available.