First thoughts on OpenAI GPT-3

If you’ve been on the internet in the last couple of years, it’s almost impossible to not have heard about OpenAI’s GPT-3, a language model that seems to be able to understand and generate text almost as accurate as a human. While it is unlikely the technology will completely change the world, it might become a game changer in some domains.

A simple example

But let’s start with the basics. OpenAI offers the technology as an API, which is neither open nor free. However, since it is an API it means that it can be used by everyone, without needs to train expensive models or have any knowledge about machine learning or AI in general. All you have todo is to send a request and get a response:

Request
curl https://api.openai.com/v1/engines/davinci/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer FOOOO" \
-d '{"prompt": "Best books about AI"}'
Response
{
    "id": "cmpl-GERzeJQ4lvqPk8SkZu4XMIuR",
    "object": "text_completion",
    "created": 1586839808,
    "model": "davinci:2020-05-03",
    "choices": [{
        "text": "In our field, there are so many great books. If you want to get started on AI, I highly recommend "Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter Norvig (2nd Edition). It's a book that every serious AI student should read.",
        "index": 0,
        "logprobs": null,
        "finish_reason": "length"
    }]
}

A couple of things to note:

  • Using the API is as simple as doing a curl request and getting back a json response. Of course, there are libraries for the main programming languages too.
  • This is a text based API. Note that my query, “Best books about AI” is just plain text. The response is also plain text. No complicated syntax, not too many parameters to tweak. Easy.

Use cases

There are many use cases for the technology and creative people come with new ones every day. However, here are the main ones. In all the examples following, I have added the input text as normal, while the GPT-3 response is bolded.

Summarization

Basically, make a summary for a block of text. Note that I have added tl;dr: at the end of my text and GPT-3 understood that I need a summary:

Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.

Jupiter is primarily composed of hydrogen with a quarter of its mass being helium, though helium comprises only about a tenth of the number of molecules. It may also have a rocky core of heavier elements,[21] but like the other giant planets, Jupiter lacks a well-defined solid surface. Because of its rapid rotation, the planet's shape is that of an oblate spheroid (it has a slight but noticeable bulge around the equator).

tl;dr:

Jupiter is a gas giant, the largest in the solar system. It is composed mostly of hydrogen and helium. It is the third brightest object in the night sky, after the moon and Venus.

Classification

This is a tweet sentiment classifier
Tweet: "I loved the new Batman movie!"
Sentiment: Positive
###
Tweet: "I hate it when my phone battery dies ๐Ÿ’ข"
Sentiment: Negative
###
Tweet: "My day has been ๐Ÿ‘"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###

Tweet text
1. "I loved the new Batman movie!"
2. "I hate it when my phone battery dies ๐Ÿ’ข"
3. "My day has been ๐Ÿ‘"
4. "This is the link to the article"
5. "This new music video blew my mind"

Tweet sentiment ratings:
1. Positive
2. Negative
3. Positive
4. Neutral
5. Positive

###

Tweet text
1. "I can't stand homework"
2. "This sucks. I'm bored ๐Ÿ˜ "
3. "I can't wait for Halloween!!!"
4. "My cat is adorable โค๏ธโค๏ธ"
5. "I hate chocolate"

Tweet sentiment ratings:
1. Negative
2. Negative
3. Positive
4. Positive
5. Negative

In this example, I want to classify tweets as either positive or negative. Note that for this case, I had todo a bit more work to make sure GPT-3 got my intentions:

  • The first line (“This is a tweet sentiment classifier”), summarizes what I am trying todo
  • The next lines show some examples. I list the specific tweet and provide an example classification. Then I use a separator (###) to hint that I have another example. Note that this is not syntax, I can use whatever I want as a separator, what really matters is to have some kind of divider for the entries.

The pattern of “show and tell” is important. Best results are given when you both summarize what you want todo, but also provide a few examples.

Idea generation

Blog topics dealing with daily life living on Mars

1. What would you miss most about life on Earth?

2. What would you miss least about life on Earth?

3. What would you want to bring with you from Earth to Mars?

4. What would you want to leave behind on Earth?

5. How will the Martian environment

Note that in this case, I broke my pattern and only told GPT-3 what I want, not giving any examples. It worked, but if the results would have been bad, I could have provided a few examples of ideas to get better responses.

More examples

Here are a few more interesting (although not necessarily useful) examples:

Translate from English to French. Note how I need to provide some examples. GPT-3 is not a translator by default so it must “understand” the fact that it needs to translate.

English: I do not speak French.
French: Je ne parle pas franรงais.

English: See you later!
French: ร€ tout ร  l'heure!

English: Where is a good restaurant?
French: Oรน est un bon restaurant?

English: What rooms do you have available?
French: Quelles chambres avez-vous de disponible?

English: Where is the restroom?
French: Oรน est les toilettes?

Extract structured data from text:

There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy. There are also loheckles, which are a grayish blue fruit and are very tart, a little bit like a lemon. Pounits are a bright green color and are more savory than sweet. There are also plenty of loopnovas which are a neon pink flavor and taste like cotton candy. Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them.

Please make a table summarizing the fruits from Goocrux
| Fruit | Color | Flavor |
| Neoskizzles | Purple | Sweet |
| Loheckles | Grayish blue | Tart |
| Pounits | Bright green | Savory |
| Loopnovas | Neon pink | Cotton candy |
| Glowls | Pale orange | Sour |
| Other |

Extract keywords from text:

Text: Black-on-black ware is a 20th- and 21st-century pottery tradition developed by the Puebloan Native American ceramic artists in Northern New Mexico. Traditional reduction-fired blackware has been made for centuries by pueblo artists. Black-on-black ware of the past century is produced with a smooth surface, with the designs applied through selective burnishing or the application of refractory slip. Another style involves carving or incising designs and selectively polishing the raised areas. For generations several families from Kha'po Owingeh and P'ohwhรณge Owingeh pueblos have been making black-on-black ware with the techniques passed down from matriarch potters. Artists from other pueblos have also produced black-on-black ware. Several contemporary artists have created works honoring the pottery of their ancestors.

Keywords: Pueblo, art, pottery, black, black ware

Write an SQL query:

Create a SQL request to find all users who live in California and have over 1000 credits:

SELECT * FROM users WHERE state = 'CA' AND credits > 1000

Movie to Emoji. In this case, it correctly emojified Spiderman, then it came up with it’s own idea of a movie, the Hulk:

Back to Future: ๐Ÿ‘จ๐Ÿ‘ด๐Ÿš—๐Ÿ•’
Batman: ๐Ÿคต๐Ÿฆ‡
Transformers: ๐Ÿš—๐Ÿค–
Wonder Woman: ๐Ÿ‘ธ๐Ÿป๐Ÿ‘ธ๐Ÿผ๐Ÿ‘ธ๐Ÿฝ๐Ÿ‘ธ๐Ÿพ๐Ÿ‘ธ๐Ÿฟ
Winnie the Pooh: ๐Ÿป๐Ÿผ๐Ÿป
The Godfather: ๐Ÿ‘จ๐Ÿ‘ฉ๐Ÿ‘ง๐Ÿ•ต๐Ÿปโ€โ™‚๏ธ๐Ÿ‘ฒ๐Ÿ’ฅ
Game of Thrones: ๐Ÿน๐Ÿ—ก๐Ÿ—ก๐Ÿน
Spider-Man: ๐Ÿ•ท
The Incredible Hulk: ๐Ÿ’ช

Under the hood

The API can use one of four engines, that can be specified as a parameter to the calls:

NameUse cases
DavinciComplex intent, cause and effect, summarization for audience
CurieLanguage translation, complex classification, text sentiment, summarization
BabbageModerate classification, semantic search classification
AdaParsing text, simple classification, address correction, keywords

Basically, Davinci is the most powerful engine and they get a bit weaker as you go down the list. While in theory you could always use Davinci, the pricing differences are significant, more on this later. While the guidelines below can help, it’s usually a matter of experimentation with all engines for your specific task and then choosing the cheapest one that provides good results.

Another thing to keep in mind is that best inputs are “show and tell”. You’d have to both state what you want and provide a few examples. While I got good results for some queries (summary, keywords), in general you get better results by providing examples. But make sure examples are not biased and as high quality as possible as they directly influence the quality of your results. Engineering the proper inputs will probably be the most time consuming activity when working with the API.

You can also tweak the “temperature” which will impact how creative GPT-3 will get. For example, when asking to get keywords for a body of text with a lower temperature, it will use words from the actual text. A higher temperature will suggest semantic expressions/synonyms that might or might not be part of the text.

Pricing

First let me introduce the concept of a token – a token is a word or part of a word. For example, “pear” is a single token, “pear”. “Descartes” is three tokens, โ€œDescโ€, โ€œartโ€ and โ€œesโ€. While there is no specific rule on how tokens are created, there are tools you can use to get or count the tokens.

Then, pricing is per number of tokens and engine used. I won’t go into full details as it is subject to change, but roughly 1000 tokens with the Davinci engine cost $0.05 and the same 1000 tokens with Ada cost ย $0.0008 . The huge price difference is an excelent motivator to try and use the cheapest engine that matches your requirements.

Other than that, please note that OpenAI has allowed, “gray” and forbidden use cases. For example, you are not allowed to give medical or legal advice using their technology, which I guess it’s obvious. You are also not allowed to post full articles generated automatically as a business model, which again makes sense. So before you start implementing your business idea, make sure that it’s a valid use case. While some restrictions seem a bit stretched, I can understand OpenAI’s policy given all the debates around AI & ethics.

Some business use cases in e-commerce

As a ecommerce developer, I found a few immediately applicable use cases:

  • Summarize the product data to automatically generate relevant meta description, keywords, ad copy. Note that in general, it works great in other languages too. I have ran a few test in summarizing Romanian text and it worked great.
  • Automatically flag negative/suspicious reviews for moderation while approving the others automatically.
  • Extract product specifications from free-form text
  • Automate category/taxonomy/keyword assignment for products

While none of the use cases above will revolutionize e-commerce, they do make data processing cheaper and faster, which is a great win.

Closing thoughts

I have fed the text of this article to GPT-3 and asked to summarize it. See below, I think it did a great job!

OpenAI GPT-3 is a great technology that can be used to automate a lot of data processing. It is not a silver bullet, but it is a great tool that can be used to automate many tasks. The technology is not free, but it is not expensive either. The main cost is the time needed to get the results you want.

Decent WordPress workflows with WPEngine

Since starting doing WordPress, I have been struggling a bit to make the dev workflow a bit more predictable. The WordPress world does not seem to be very cautious in general, I have seen quite a few projects where the code was not versioned at all and the updates were done via the admin, directly in production. Luckly, it seems the the folks at WPEngine try to cater for the projects where a more structured matters.

First of all, every WP Engine hosted website provide the option to add a dev and even a stage website – https://wpengine.com/support/environments/ . Even better, they offer an option to sync back the data between the envs, i.e. you can pull the production database back to staging anytime it lags behind. This alone is pretty cool, it means that with a few clicks, you are able to have a “backup” where you or the client can test their new ideas without creating a prod outage.

Moving on, you can “tie” each env to a git repo – https://wpengine.com/support/git/ . The process is pretty simple – add your public key, then get a full featured repo for each environment. Whenever you push a branch, the content will be visible on the env that’s associated with the repo. Pretty fast too, it only adds a few seconds for the build when pushing. If you use multiple branches, the last branch you push is the one that reflects the code changes. So you can work with feature branches and any time you switch to a new one it will take over.

One thing that I never liked about WordPress is that it gets complicated to not version the core. Since there is no composer, you have to create a monstrous gitignore to exclude the core, which is all over the place. However, WPEngine makes it pretty easy:

  • You use their gitignore to exclude anything that’s core related from your git repo.
  • You commit your own modules.
  • When the time comes for the upgrades, you can do them locally and test. Because of the gitignore, there is nothing that needs to be commited.
  • You use their own admin panel todo a one click upgrade to WordPress.

While I think their own WP version is modified a bit for some extra features, I never ran into an issue where it worked on my local but not on their hosting. And anyway, you can do the process on the dev env first just to be sure.

Finally, a pleasant surprise is that everything is quick. Builds take a few seconds. Upgrades less than one minute. And most of the time, it just works. When it did not, their supported responded on chat in a few minutes and helped almost real time.

Of course, if you don’t care about dev envs/git repos you can just ignore all those WPEngine features and keep working directly in production. They did a good job of making all the steps a choice, not a requirement.

I’m glad services like WPEngine exist, they can really help in having a structured deploy process in just a few minutes.

Magento 2 code quality tools

If you are planning to publish an extension to Magento’s marketplace, it will have to pass a few quality gates before getting published. Magento did provide some tools to allow you testing locally before submitting the extension.

First, there is https://github.com/magento/magento-coding-standardย . While this tool is mostly centered around code formatting, it can catch some nastier stuff too, like loading a model in a loop or using deprecated libraries. Assuming that you will be testing your extension in the context of a Magento website, you first need to add to your composer.json file, somewhere at root level:

"scripts": {
    "post-install-cmd": [
        "([ $COMPOSER_DEV_MODE -eq 0 ] || vendor/bin/phpcs --config-set installed_paths ../../magento/magento-coding-standard/)"
    ],
    "post-update-cmd": [
        "([ $COMPOSER_DEV_MODE -eq 0 ] || vendor/bin/phpcs --config-set installed_paths ../../magento/magento-coding-standard/)"
    ]
}

Then install the coding standards package:

composer require --dev magento/magento-coding-standard

Then run the checker:

vendor/bin/phpcs --standard=Magento2 app/code/My/Extension

This will provide an output like:

FOUND 0 ERRORS AND 16 WARNINGS AFFECTING 12 LINES
--------------------------------------------------------------------------------------------------------------------------------
  7 | WARNING | [x] Opening brace of a class must be on the line after the definition
 11 | WARNING | [x] The first parameter of a multi-line function declaration must be on the line after the opening bracket
 12 | WARNING | [x] Multi-line function declaration not indented correctly; expected 8 spaces but found 32
 13 | WARNING | [x] Multi-line function declaration not indented correctly; expected 8 spaces but found 32
 13 | WARNING | [x] The closing parenthesis of a multi-line function declaration must be on a new line
 14 | WARNING | [x] The closing parenthesis and the opening brace of a multi-line function declaration must be on the same line
 45 | WARNING | [ ] Expected "if (...) {\n"; found "if(...) {\n"
 45 | WARNING | [x] Expected 1 space(s) after IF keyword; 0 found
 45 | WARNING | [x] No space found after comma in function call
 47 | WARNING | [x] Expected 1 space after closing brace; newline found
 48 | WARNING | [ ] Expected "} else {\n"; found "}\n        else {\n"
 55 | WARNING | [ ] Line exceeds 120 characters; contains 125 characters
 56 | WARNING | [x] No space found after comma in function call
 56 | WARNING | [x] No space found after comma in function call
 59 | WARNING | [ ] Code must not contain multiple empty lines in a row; found 2 empty lines.
 64 | WARNING | [x] Expected 1 newline at end of file; 0 found
--------------------------------------------------------------------------------------------------------------------------------
PHPCBF CAN FIX THE 12 MARKED SNIFF VIOLATIONS AUTOMATICALLY
--------------------------------------------------------------------------------------------------------------------------------

You can fix some of the issues automatically:

vendor/bin/phpcbf --standard=Magento2 app/code/My/Extension

Finally, Magento also offers https://github.com/magento/marketplace-tools . This is not a quality tool per se, just a little helper to check that the archive you are about to upload is correct. Use it like:

./validate_m2_package.php app/code/My/Extension/Extension.zip

While Magento 2 does more checks against your extension so it can still get rejected, the above should catch the obvious issues before starting the submission process. I also find them useful to quickly check the quality of a 3rd party extension.

 

NodeJS – doing external calls in parallel

I am still getting my head around Node’s async model. The specific problem that I was tackling was calling a paged API endpoint. In most programming languages, you’d call the first page, wait to get the data, then call the second and so on. It turns out that there is a better way todo this in Node. Start by defining a few initial variables :

const fetch = require("node-fetch");
let currentPage = 1;
const queryUrl = 'http://local.dev/api.php';
let data = [];

Then, create an array of all the urls you want to call. Well, this means you need to know the number of pages in advance, which can be done by doing a single call first. For simplicity I assume I know the number of pages:

let urls = [];
while(currentPage <= 5) {
  urls.push(queryUrl+'?page='+currentPage);
  currentPage++;
}

Now, just do the calls. Since we are getting back promises, this code will not wait for each individual call to finish:

const promises = urls.map(async url => {
  try {
    console.log("Querying "+url)
    const response = await fetch(url);
    return response.json();
  }
  catch(err) {
    console.log(err)
  }
});

Finally, assemble back the data returned by each promise:

for (const promise of promises) {
  data = data.concat(await promise);
}

The data array would contain all the info, also sorted in the correct way no matter if the API calls responded in a different order. Neat.

On compromise

As software developers, we often need to compromise. Budgets go short, requirements change, launch dates must be honored. Everyone in the field for more than a couple of years has plenty of examples of rushing through a project, cutting corners, skipping testing and generally delivering a suboptimal solution to hit the time or budget constraints.

But how much compromise is enough? Hard to tell as software development is not that regulated. In other fields, there is a term called “malpractice”, which means that the professional (and not necessarily the company that hires them), is legally responsible for certain types of mistakes. This ensures for example that no surgeon does half the surgery because the manager wanted it done quicker. They are liable for their work, finger pointing will not make the consequences go away. Now, luckily as an e-commerce developer, I cannot kill anyone with my mistakes. But I can lose millions of dollars. I can even make a company go under.

That’s why a while ago, I have decided that there are some lines I will not cross no matter the pressure. I would go as far as losing the job/project instead of compromising. But let me go back a bit. The first thing you need to understand as a professional developer is your role as a professional. Basically you are responsible for:

  • Defining the architecture. The architecture must work. There is no point of proposing something that’s flawed from the start. A cheap car that does not work does not have any value.
  • Estimating. Project Managers or CEOs have wishes on when to launch, but you are responsible for telling them if it can be done. It might not and that’s ok. You are also responsible of NOT estimating if you cannot do it. When you go a physician with an illness that does not have a definite timespan, the physician will not be promising you will be cured in one month. They will tell you to follow a treatment and check in from time to time. It will be done when it will be done. Worst words a project manager wants to hear, but sometimes that’s the truth.
  • Let everyone know if you discover you were wrong. That’s a hard discussion to have, but keep in mind that everyone’s goal is a working solution. There might be consequences for you, but remember you’re the professional and must act accordingly.

Now, back to compromises. It’s hard to tell what is something that really puts the project at risk and what’s not a big deal. Especially under pressure. Personally, I have compiled a list of things that I am not compromising on:

  • Always reproduce the bug first. This might be a very complex, but if I cannot do it, how can I check my solution?
  • Always test all use-cases myself. While one might decide that QA by the dedicated team can be skipped, I am never skipping my own testing.
  • Always take a moment to think through the security implications. Never leave a security flaw in place with the note to fix it later.
  • Never cut a corner that I know I cannot uncut. It’s ok to write messier code under pressure, but only if there really is a way to go back and fix it.

I guess it ultimately boils down to being ok with the solution you provide. Not thrilled by it, not even happy, but at least ok with it. If a developer sees the code a few months later, he should be able to say that the solution is good enough.

There are of course a lot of other things I can compromise on:

  • putting in extra hours to see the project through.
  • implement a suboptimal solution knowing that I can go back and fix it later. Of course, with the client’s approval/understanding of the compromise.
  • hard coding, code duplication, coding standards, not-so-performant solutions and everything else related to code quality, as long as the solution is still ok-ish.

Even the above compromises do not play well long-term. While they will not do major damage at any specific point in time, they add tech debt that makes the project harder and harder to work on. Each change becomes more expensive and error-prone. If the client is always in rush mode it’s ok to warn them a few times and at some point, look for another project/job. Leaving a project is never easy, but I prefer that to knowing I was the guy that slowly pushed it over the point of no return.

A case against one step checkouts

Magento 2 provides a versatile checkout process, mainly consisting of two steps – shipping and billing. For whatever reasons, a fair amount of merchants are trying to get away from it and use a one step checkout –ย  a checkout that combines both steps into one. The reasoning seems to be that presenting all information at once makes it easier to checkout.

However, I have seen a lot of merchants that invested in such a checkout, only to revert back to the original one after a while. I think there are more reasons that contribute to this.

Usability

  • combining more steps into one implies more fields on the same screen. A wizard-like approach is cleaner.
  • the customer still needs to fill in the same amount of fields.
  • unexpected changes in fields. One might fill in the card details, then change the shipping address to a region where you don’t take cards. Their payment info must be re-entered.

Implementation

  • A lot more ajax requests. While this can be mitigated by a proper implementation, but that’s not always the case.

Maintenance

  • You open up the way to a lot more ways one can checkout, making testing more difficult.
  • All one step checkout extensions are a bit heavy, Magento upgrades become harder.
  • Other checkout-related extensions might not work without even more changes.

Unknown

  • There must be a reason why Magento (and even Shopify), ship with a multi-step checkout. I am not that familiar with the research that led them on this path, but I assume it was not accidental.

On the other hand, I am curious on whether you have more information on when a one step checkout is a good solution for Magento.

Logging sftp activity

Logging SFTP activity can be done (on most Linux systems) by editing /etc/ssh/sshd_config . Simply find:

Subsystem sftp /usr/libexec/openssh/sftp-server

And change to:

Subsystem sftp /usr/libexec/openssh/sftp-server -l INFO

Then restart the ssh daemon:

systemctl restart sshd

The info log level is just one of many, there are others, like VERBOSE, DEBUG etc, but usually INFO is a good compromise. To see the logs simply tail /var/log/messages:

tail -f /var/log/messages | grep /the/directory/i/care/about

Composer artifacts – an easy way to include composer packages with no repos in your code

As a Magento developer, I always prefer to add 3rd party extensions using composer. Sadly, a fair amount of vendors still provide archives with their modules instead of composer-ready repositories. Creating separate private repositories to keep each extension and then including them in my project seems like an overkill and it looks that there is a better solution for this use case – composer artifacts.

The idea is pretty simple – create an archive with your composer-ready code, add it in a directory of your main project, then simply ask composer to use it.

As an example, let’s assume we have a fictional module that has the following composer.json file:

{
"name": "paul/artifact",
"description": "this is an artifact",
"minimum-stability": "stable",
"version": "1.0",
"license": "proprietary",
"authors": [
  {
    "name": "test",
    "email": "email@example.com"
  }
]
}

The only part to keep in mind is the name of the package, paul/artifact in this case. To use it, create a zip archive of the code (including the composer.json file) then add it to your project in a subdirectory, i.e. artifacts/. The name of the zip archive is irrelevant.

In your main project, you can make composer aware of the artifacts directory by adding the following new repository type:

"repositories": [
    .....
    {
      "type": "artifact",
      "url": "artifacts"
    }
  ],

ย 
The artifact type is what will make composer search for artifacts as zip archives as opposed to pulling from source control repos. The “url” is the name of the directory where the zip archives are, relative to your project root.

Once this is done, you simply require the package as you always do:

"require": {
    "paul/artifact": "1.*"
  }

This brings up all composer’s features – it will install dependencies, lock versions etc. There is virtually no difference between an artifact package and any other type of package.

A small gotcha – if you want to update the package (say you zipped up a new version), don’t forget to clear composer’s cache (by running composer cleancache). Composer caches the artifacts exactly as it caches remote repos, so if you are not changing version numbers you have to intentionally clean caches so that your new version is picked up.

Hope this saves you some time.

The current state of the Magento Marketplace

One of the best reasons to use Magento are the community extensions. Free or paid, they augment Magento’s core features and save a lot of development time. Magento always supported this effort by offering a curated extension list. In the old days, this was called Magento Connect. It was simply an extension directory, Magento did not intermediate support or purchasing the extension. It still had value though as it did include a review list which of course, was more relevant than the 5-star reviews on the vendor’s site.

A short history of the marketplace

Magento Connect had a lot of down sides though. The approval time was very long. All the focus was on the marketing. You had to have pretty images and descriptions, but you could get away with very low quality, or even broken code. What was the worst though is that Magento heavily promoted the idea that you can go to the marketplace, get an extension, upload it via SFTP to your server and use it. Magento was not (and it isn’t to date), that kind of system. This resulted in a large number of bad reviews from non-technical merchants (“This extension breaks your site”, “I am not able to upload, PLS HELP!”, “Not good! Do not USE!”). Magento’s approach frustrated the open source community. It’s one thing to charge for the extension and provide support to merchants that cannot afford a developer. A totally different thing to provide free software, but having to deal with installing the extension/fix incompatible merchant stores. This resulted in a large number of open source developers simply pulling off their extensions from Magento Connect and keeping them on Github only, where the audience is more technical. I found myself searching for an extension on Github first and on connect second, which defeated the whole purpose of the effort to have an extension directory.

Fast forward to Magento 2, the approach was changed completely and Magento Connect was replaced by Magento Marketplace. There were major improvements right from the start:

  • Magento intermediates the purchasing (and charges a fee to the extension vendor). Now I can at least address Magento for a refund instead of obscure vendors.
  • You can no longer post a review unless you actually purchased the extension.
  • Better experience overall (more relevant searches, more relevant extension description pages to name a few).

What did not improve from the start was the extension quality. Actually, initially the quality was worse than on the old Magento Connect. Probably Magento needed to have a big number of extensions to offer, so they accepted pretty much anything (as long as you had pretty pictures and descriptions!). Vendors tried to release their Magento 1 extensions for Magento 2 asap, ignoring all coding standards and architectural approaches.

Luckily, Magento worked hard and improved this. Here is how the current process looks like:

Screen Shot 2018-11-04 at 11.32.32 AM

First, there are now two parallel, completely separate tracks – Marketing and Technical.

Marketing

The marketing flow is about the pretty pictures and description. Magento actually made this part really useful…

  • you need to add screenshots. Really important as they are the easiest way to understand quickly if the extension does what you need it todo.
  • the description has stricter guidelines so that it’s focused on the feature set, not on the sales pitch.
  • you have to submit a installation and user manual.
  • you have to clearly specify Magento version compatibility and pricing, including whether the extension relies on another third party subscriptions.

Technical

This is very important for a developer/agency. Personally, I try my best to write good quality code in my projects. Then the merchant buys an extension from the official marketplace and I am puzzled at how low quality it is. Or at least, this is how it used to work. Now there is a technical review. Mainly, it has three steps:

  • an automated code sniffer. It catches quite a few things. It even caught a few things in my code even I consider myself “seasoned”. While it’s still just an automated checker, you cannot do blatant mistakes anymore.
  • a Varnish test. Basically, check that the extension does not break caching. I had to ask for refunds on extensions in the past due to their architecture simply disabling caching and relying on it.
  • a manual QA test. While I am not sure what happens there, I like to think that a real person actually checks the basic flows of the extension and maybe looks over the code too.

I am sure the above works. First, there is no way to bypass the review process. If the automated code tester finds issues, you have to fix them. Then, I can simply feel how the quality has increased. It’s becoming the exception that I buy an extension and have a bad experience. Actually, I am currently only using the marketplace to buy extensions as I trust Magento’s reviews. At least for me, the Magento-owned marketplace concept finally worked.

Why is it better

Besides the above, there are a few not-so-obvious improvements that really helped:

  • the manual review team seems a bit more trained. I did not get my extensions rejected for silly reasons in a while.
  • the review process is faster. No more waiting for months.
  • the documentation is better on how the submission should look like, at least on the marketing side.

What’s still missing

While the extension markeplace is better, it’s still a long way from great imho. Here is what I’d like to see in the future:

  • A Magento-owned demo site so I can check the extension before buying. The vendors now take care of the demos, but not all of them do it properly.
  • A git repo for the extension. Being able to see the commit history helps me a lot.
  • Magento should own security issues. Sadly, vendors do a poor job at communicating a security issue. I’d like to be able to submit a form to Magento when I have one, Magento should work with vendor to correct, then all extension owners should be notified. This is left at the discretion of the vendor now. Most of them simply release a new version but forgot about the patch, or even about mentioning the upgrade fixes a critical security issue.
  • As an extension vendor, I’d love to see subscription-based pricing.

Conclusion

In the last year, I started to trust the marketplace as an authoritative entity in the extension world. While there are a few things to improve, Magento is definitely moving in the right direction. I expect that by then end of 2019, we will have an even better marketplace.

Magento integrations using Xtento

One of the most common tasks I have to deal with in Magento are integrations. Almost any system I worked with needs to push order or product data to an ERP or marketing software, or get stock/product info/tracking numbers from an external source. Every integration is unique in a few ways…

  • the communication channel. REST, SOAP, sending files via SFTP are common variations
  • the data format. JSON, XML, CSV to name a few
  • the actual data mapping. We either pass fields as-is or combine/transform them.

However, any good integration has a lot in common:

  • a log must exist so we can refer back to when and how each piece of information was synchronized
  • errors must be logged, with an alert system so we are aware of any failures
  • the system should be able to queue data in case the integration is down. We cannot lose info especially when dealing with financial records.
  • the system must be able to retry failed records
  • the actual field mapping must be easy to change, ideally without changing code

I have been looking for good solution to build integrations for a while. On the lower end, there is the do it yourself custom shell script. Easy to build, but usually missing key elements, like retries or flexible data mapping. On the higher end, we have full ETL solutions. They tend to be expensive and add new software components to the mix.

Almost accidentally I stumbled upon Xtento’s Import/Export suite –ย https://www.xtento.com/magento-2-extensions.html . They flew under my radar as I had bad experiences in the past with such extensions and for a while, concluded that the best import/export is the one you built on top of Magento’s default.

Let’s go over the steps involved to export orders from Magento via Xtento. First, one starts with defining destinations. A destination is the place where your orders will go. You have a few options:

  • on the local server in a specified folder
  • on a remote server via a SFTP or FTP connection
  • to an email
  • to a HTTP server (i.e. REST)
  • to a web service. Use this for XML-RPC or SOAP integrations
  • via a custom class, where you can define the communication with your exotic system

So the above options should cover any possible target. One nice thing is that you can have multiple destinations, so you could place the orders on a SFTP but also mail a copy of the files to you for later review.

After defining the destinations, the next step is to define the export profile. Most options are obvious so I will go only over the important ones:

  • you can choose the entity to export, i.e. orders. Usually each exportable entity corresponds to a specific extension that you need to buy from Xtento.
  • you can define filters. For example, you can decide that you only want to export “processing” orders, keeping the “pending” ones in queue until they are reviewed
  • You can define the export mapping as an XSLT template. This was the feature that I was sold on. XSLT templates allow you to be as expressive as you need to. You can use all the standard fields, apply transformations, use all the fields in the order/related tables (your custom ones too). All this with a nice autocomplete interface and a dead-simple way to test the template with an existing order. Once you get the hang of it, you almost never need to refer to the docs/examples, it’s that easy. If you do need help though,ย https://support.xtento.com/wiki/Magento_2_Extensions:Magento_Order_Export_Module#XSL_Template has you covered.
  • You can define when your export runs. Do you want to export orders on schedule? How often? Do you export all orders in the last N hours or only what’s new? If the export process is time consuming, you have a CLI command to run it outside the main Magento cron.
  • You have a flexible manual export option in case you need to replay any of the missed exports, or simply test the process.

Everything comes with logging. You have filterable grids where you can see the exported entries and errors. You also have the option to locally store the export files for later review.

If you need to import entities, Xtento has you covered too, The process is very similar in that you still have sources where you can pull from, profiles you can define, same logging capabilities, a way to map the data. In addition to imports, you have an actions section where you can define what happens when an entry is imported. For example, when you import a tracking number, you can have Xtento ship and bill the order automatically.

I should mention that currently Xtento does not offer a product import solution. You can import stocks, but not product data. I’d love to see that on their offer sometimes.

What I really like about the extensions is that they are developer-friendly. Almost everything in their system has a fallback to a custom class. You have a very exotic export destination? You can define a class to implement the communication logic. Need to map your data in a way the XSLT template does not support? You can define a class and a method just for that. Finally, having logs for all operations make it easy to identify random issues. It scales ok too, I have been exporting importing 100k records per day with no performance issues.

Here are a few usecases where I have used Xtento successfully, usually without writing any line of code:

  • product exports to marketplaces/product aggregators. I still prefer specific extensions for big marketplaces, like Amazon or Google Shopping, but will use Xtento for the ones that have poor extensions or none at all.
  • pushing order data to ERPs and getting back tracking info, billing and shipping automatically. That’s a huge one, before Xtento I used to spend a lot of time on these type of implementations.
  • pushing order and product data to marketing software, like mailing list managers.
  • importing stocks from an external source (usually ERP).

Xtento might be lacking all the bells and whistles of an ETL solution, but in an ecosystem where not everyone has a fortune to spend on an integration, their extension suite is more than decent to get things done.