[DATAES760] HANDLING OF DROPPED CONNECTIONS CREATED 06MAR20 UPDATED 31DEC20

[DATAES760] HANDLING OF DROPPED CONNECTIONS CREATED 06MAR20 UPDATED 31DEC20






[#DATAES-760] Handling of dropped connections

[DATAES-760] Handling of dropped connections Created: 06/Mar/20  Updated: 31/Dec/20  Resolved: 25/Apr/20

Status:

Closed

Project:

Spring Data Elasticsearch

Component/s:

Infrastructure

Affects Version/s:

None

Fix Version/s:

None


Type:

Improvement

Priority:

Minor

Reporter:

sothawo

Assignee:

sothawo

Resolution:

Won't Do

Votes:

0

Labels:

None

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified


Issue Links:

Depend

depends on

DATAES-751

Introduce ClientCallbackfor the rest ...

Closed



Last updater:

[DATAES760] HANDLING OF DROPPED CONNECTIONS CREATED 06MAR20 UPDATED 31DEC20 Spring Issues


 Description 

 

When using the REST clients (reactive and non-reactive) it can happen that the connection is lost after longer idle times.

 

This should be investigated how

  • this can be detected

  • what automatic recovery can be implemented

 

non-reative

For the non-reactive part this can be integrated in the execute method to have a retry mechanism once DATAES-751 is done

Edit:

non-reactive code might use for example Spring Retry to catch errors and retry to do the failed operation again, for reactive code, the different flavours of Mono.retry() or Flux.retry() can be used.



 Comments 

 

Comment by sothawo [ 08/Mar/20 ]

once the communication with the client is done in the execute method, connection error handling can be don in this centralized place

Comment by Mark Paluch [ 01/Apr/20 ]

Note that there is Spring Retry that might address some of these issues. Connection validity checking should be part of the HTTP client as there are tons of libraries that build on top of Spring. Having each of them implementing a valid check misses the point.

Comment by sothawo [ 01/Apr/20 ]

You mean to let the user handle this with spring-retry?

Comment by Mark Paluch [ 01/Apr/20 ]

Yeah, that's the alternative. Since we cannot make any kind of assumptions on whether retrying should be used or what the settings should be, we cannot do anything useful here. We discussed an approach that lives within `RestTemplate`/`WebClient` with the team but that is again something for Spring Retry itself. For the reactive case, a simple `retry(…)` operator should be a quick way out.

Comment by sothawo [ 01/Apr/20 ]

ok, so I think this issue can be closed. I'll have to find a different solution how I could handle connection problems during runs of the whole tests.

Comment by jvmlet [ 15/Apr/20 ]

Would you please include brief guidelines in documentation/FAQ about this and  how to configure rest template with retry if you decide to close this issue as 'Won't fix' 

Comment by jvmlet [ 16/Apr/20 ]

  Since we cannot make any kind of assumptions on whether retrying should be used or what the settings should be, we cannot do anything useful here.

I tend to disagree, Spring Boot is about conditional configuration with healthy defaults. Customizable.

The straightforward solution for this issue is to retry on Connection Reset  error.

This is how we detect this kind of error :

 

Exception ex = .....;
Predicate<Throwable> shoudRetryPredicate =IOException.class::isInstance
.and(t -> Optional.ofNullable(t.getMessage()).orElse("").matches("(?im).*connection\\s+reset\\s+by\\s+peer.*")
);

 

 

Comment by Mark Paluch [ 16/Apr/20 ]

Note that Spring Boot and Spring Data Elasticsearch different projects. Spring Boot depends on Spring Data Elasticsearch, not the other way round. I suggest filing a ticket in the Spring Boot issue tracker to provide retry support for web clients. Once these are in place, you should be able to configure retry behavior in Spring Boot.

Retrying on connection reset might be an assumption that works for your case. In other arrangements that it might be undesired behavior.

Comment by jvmlet [ 16/Apr/20 ]

spring-data-elasticsearch is shipped as spring-boot-starter, right ? It means  your auto-configuration  has full capabilities to create/configure/customize every single  bean in app context . 

Also, spring-data-elasticsearch's ElasticsearchRestTemplate seems to be using org.elasticsearch.client.RestHighLevelClient underneath , how retry support on  org.springframework.web.client.RestTemplate  can help here ?

Comment by Mark Paluch [ 16/Apr/20 ]

You're right about RestTemplate, this was an oversight, it should be the RestHighLevelClient indeed.

Spring Data Elasticsearch is shipped independently from Spring Boot. Spring Boot depends on (Spring Data) Elasticsearch.

Comment by sothawo [ 16/Apr/20 ]

I just had a deeper look at the org.elasticsearch.client.RestClient that is used by both the low level and the high level Elasticsearch clients. The RestClient has a built-in failure retry. From the Javadocs (https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L87-L91):

* The method {@link #performRequest(String, String, Map, HttpEntity, Header...)} allows to send a request to the cluster. When
* sending a request, a host gets selected out of the provided ones in a round-robin fashion. Failing hosts are marked dead and
* retried after a certain amount of time (minimum 1 minute, maximum 30 minutes), depending on how many times they previously
* failed (the more failures, the later they will be retried). In case of failures all of the alive nodes (or dead nodes that
* deserve a retry) are retried until one responds or none of them does, in which case an {@link IOException} will be thrown.

Every call done from the RestHighLevel client gets down the performRequest method. So if you have more than one node configured, failures will be handled.

 So when running in a local test or development mode with one node, you might encounter failures. For production use you shouldn't run with one node only anyhow. And if you have a loadbalancer or something like this fronting a multinode cluster, then this should handle failover.

Comment by jvmlet [ 16/Apr/20 ]

Would you consider PR that provides extendable failover support with configurable Retry, Noop (Propagate exception)  and custom implementation? 

Comment by sothawo [ 16/Apr/20 ]

Thinking about the change that was implemented with DATAES-751, you can add your own retry-handling pretty easily by providing a custom ElasticsearchRestTemplate:

 

@Configuration
public class RestClientConfig extends AbstractElasticsearchConfiguration {

    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
        ClientConfiguration clientConfiguration = ClientConfiguration.builder() //
            .connectedTo("localhost:9200") //
            .build();
        return RestClients.create(clientConfiguration).rest();
    }

    @Override
    public ElasticsearchOperations elasticsearchOperations(ElasticsearchConverter elasticsearchConverter) {
        return new ElasticsearchRestTemplate(elasticsearchClient(), elasticsearchConverter) {
            @Override
            public <T> T execute(ClientCallback<T> callback) {
                try {
                    return super.execute(callback);
                } catch (DataAccessResourceFailureException e) {
                    // retry 
                    return super.execute(callback);
                }
            }
        };
    }
}

As all the calls using the client are done via execute(ClientCallback<T>) now, you can just overwrite this method, call the base, handle error retry by calling the base implementation again. So no need for further changes.

Comment by jvmlet [ 19/Apr/20 ]

Thanks for the sample, sothawo,  but   ClientCallback<T>  has package private access, please expose as public.

Comment by sothawo [ 19/Apr/20 ]

That was recently fixed with DATAES-789

Comment by Spring Issues [ 31/Dec/20 ]

Spring Data has moved from Jira to GitHub Issues. This issue was migrated to spring-projects/spring-data-elasticsearch#1334.

Generated at Sun Dec 26 03:29:40 UTC 2021 using Jira 8.13.15#813015-sha1:24256f8a55853293c87d0465fdf0407f4975035a.





Tags: [dataes760], handling, dropped, created, 31dec20, updated, connections, 06mar20