Image for post
Image for post
Image by Author

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector ( link). WebRTC though starts to show its age and it suffers from many false positives.

Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. …


Is upgrading to Ampere worth it?

Image for post
Image for post
Image by Author

Every time when the essential question arises, whether to upgrade the cards in the server room or not, I look through similar articles and watch such videos.

Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:

  • The authors usually take into account only the “adequacy” for the market of new cards in the United States;
  • The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
  • The popular…


Our models are on par with premium Google models and also really simple to use.

Image for post
Image for post

We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:

  • English;
  • German;
  • Spanish;

You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.


Making progress towards useful and practical STT using a combination of existing ideas and techniques

Originally published at https://thegradient.pub on March 28, 2020. All citations and references preserved as they were in the original article. Medium also does not have any handy table-of-contents features, so I will leave the original links as well. Where appropriate, I will provide a link to the original part of the article. I also provide links to more up-to-date benchmarks.

  1. Introduction
  2. Related Work and Inspiration
  3. Open Speech To Text (Russian)
  4. Making a Great Speech To Text Model
  5. Model Benchmarks and Generalization Gap
  6. Further Work

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress…


Finally we made it!

Image for post
Image for post
I wanted to give him an ushanka, but my paint editing skills are too poor

TLDR

This is a very brief accompanying post for the release of Open STT / TTS v1.0.

In a nutshell:

  • Open STT is published here, Open TTS is published here, noise dataset is published here;
  • We added 2 new datasets in 2 new large and diverse domains with around 15,000 hours of annotation;
  • New datasets have real speaker labels (which will soon be released);
  • Overall annotation quality is improved, most numerous annotation edge cases were fixed;
  • Dataset normalization greatly improved;

Open STT recap

Some time ago we were disappointed by the state (a more refined article coming soon) of STT…


Do not fall prey to fallacies

Image for post
Image for post

Boring introduction

Recently I read this trilogy of books (which is awesome, down-to-earth, gritty and instant classic by the way). There is a very nice analogy that permeates the whole trilogy in one way or another (I just could not help sharing it):

Like hunters in a “dark forest”, a civilization can never be certain of an alien civilization’s true intentions. The extreme distance between stars creates an insurmountable “chain of suspicion” where any two civilizations cannot communicate well enough to relieve mistrust, making conflict inevitable. Leaving a primitive civilization alone is not an option due to the exponential progress of technological…


4000 hours of STT data in Russian

Image for post
Image for post

If you do not pay the iron price, you know someone paid it for you. It works like this in every aspect of life

Originally posted on spark-in.me on May 1, 2019

TLDR

This is an accompanying post for our release of Russian Open Speech To Text (STT/ASR) Dataset. This is meant to be a bit light-hearted and tongue in cheek. All opinions are my own, and probably opinions of my colleagues differ. This is a non-technical summary. …


Image for post
Image for post
Deep FastText. You can also use this trick with transformers

Does it make sense to pre-train transformer-based models? Can you do better than BPE? Which architecture is better for which task?

This will be one of those articles, where we do not publish “plug and play” code, but we ramble endlessly about what we tried and failed (or maybe not?) and most importantly how this fits onto a broader image.

Over the course of last 3–6 months we have tried various models for basic NLP tasks (like classification, sequence-to-sequence modeling, machine comprehension) for one morphologically rich language — Russian. This is kind of cool, because we can inherit the vast modern literature and codebase of modern NLP methods (RNNs, LSTMs, embeddings, PyTorch, FastText, LASER to name a few).

But there are…


Or building a task-agnostic seq2seq pipeline on a challenging domain

Image for post
Image for post

This is the first time we managed to win (i.e. 1st place) an ML competition together

A small philosophic preamble.

We participated as Profi.ru team (yours truly + Dmitry Voronin helped a bit) and Sava Kalbachou (Lucidworks), my old time pal from many competitions, also a specialist in NLP.
It was a long, difficult, nervous and surprisingly rewarding journey for our team.

To sum all the experience in one line — the journey is more important than the destination. This competition taught us a lot about being persistent, trying new things, falling down and standing up again.

I personally compiled…


Image for post
Image for post
Main objective — help our clients find what they want, ideally even before they type the whole query. Search also should generalize reasonably well (synonyms / forms of known words, Russian has rich morphology)

TLDR

This is an executive summary about what we managed to do in approximately 2 months in the Profi.ru DS department (Profi.ru is one of online leading service marketplaces in the CIS region) within a Semantic factory project.

This article mostly focuses on comparing novel NLP techniques applicable in a applied business setting and may serve as a guide if you plan to do something similar in your company/project.

In a nutshell — we had 2 objectives:

  • Make search / routing within the main search bar on Profi.ru better (supervised task);
  • In doing so — develop unsupervised methods to efficiently search…

Alexander Veysov

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store