Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector ( link). WebRTC though starts to show its age and it suffers from many false positives.
Also in some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private / sensitive if it contains (i) a name (ii) some private ID. …
Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:
Our models are on par with premium Google models and also really simple to use.
We are proud to announce that we have built from ground up and released our high-quality (i.e. on par with premium Google models) speech-to-text Models for the following languages:
You can find all of our models in our repository together with examples, quality and performance benchmarks. Also we invested some time into making our models as accessible as possible — you can try our examples as well as PyTorch, ONNX, TensorFlow checkpoints. You can also load our model via TorchHub.
Originally published at https://thegradient.pub on March 28, 2020. All citations and references preserved as they were in the original article. Medium also does not have any handy table-of-contents features, so I will leave the original links as well. Where appropriate, I will provide a link to the original part of the article. I also provide links to more up-to-date benchmarks.
Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade. Currently, it is often believed that only large corporations like Google, Facebook, or Baidu (or local state-backed monopolies for the Russian language) can provide deployable “in-the-wild” solutions. …
Finally we made it!
This is a very brief accompanying post for the release of Open STT / TTS v1.0.
In a nutshell:
Some time ago we were disappointed by the state (a more refined article coming soon) of STT in general (compared to Computer Vision for example), especially in Russian. …
Recently I read this trilogy of books (which is awesome, down-to-earth, gritty and instant classic by the way). There is a very nice analogy that permeates the whole trilogy in one way or another (I just could not help sharing it):
Like hunters in a “dark forest”, a civilization can never be certain of an alien civilization’s true intentions. The extreme distance between stars creates an insurmountable “chain of suspicion” where any two civilizations cannot communicate well enough to relieve mistrust, making conflict inevitable. Leaving a primitive civilization alone is not an option due to the exponential progress of technological change, and a civilization you have detected might easily surpass your own technological level in a few centuries and become a threat. If you have detected a civilization, than you have also confirmed that said civilization will eventually be able to detect you. …
If you do not pay the iron price, you know someone paid it for you. It works like this in every aspect of life
Originally posted on spark-in.me on May 1, 2019
This is an accompanying post for our release of Russian Open Speech To Text (STT/ASR) Dataset. This is meant to be a bit light-hearted and tongue in cheek. All opinions are my own, and probably opinions of my colleagues differ. This is a non-technical summary. …
This will be one of those articles, where we do not publish “plug and play” code, but we ramble endlessly about what we tried and failed (or maybe not?) and most importantly how this fits onto a broader image.
Over the course of last 3–6 months we have tried various models for basic NLP tasks (like classification, sequence-to-sequence modeling, machine comprehension) for one morphologically rich language — Russian. This is kind of cool, because we can inherit the vast modern literature and codebase of modern NLP methods (RNNs, LSTMs, embeddings, PyTorch, FastText, LASER to name a few).
But there are usually 3…
This is the first time we managed to win (i.e. 1st place) an ML competition together
A small philosophic preamble.
We participated as Profi.ru team (yours truly + Dmitry Voronin helped a bit) and Sava Kalbachou (Lucidworks), my old time pal from many competitions, also a specialist in NLP.
It was a long, difficult, nervous and surprisingly rewarding journey for our team.
To sum all the experience in one line — the journey is more important than the destination. This competition taught us a lot about being persistent, trying new things, falling down and standing up again.
I personally compiled / optimized 3 or 4 pipelines for this task, until I could build the best one. Also, as usual, we found a crucial error one day before the finish line…
This is an executive summary about what we managed to do in approximately 2 months in the Profi.ru DS department (Profi.ru is one of online leading service marketplaces in the CIS region) within a Semantic factory project.
This article mostly focuses on comparing novel NLP techniques applicable in a applied business setting and may serve as a guide if you plan to do something similar in your company/project.
In a nutshell — we had 2 objectives: