Site Reliability Engineer #PFS860267
Company: Client of Professions
Work Location: Remote, Ho Chi Minh
Salary: 1,500 $ to Negotiable
Roles and Skills:
Job Overview And Responsibility
Responsibilities:
- Help with problem resolution and incident firefighting
- Cross system monitoring automation and anomaly detection
- High availability of our solutions plus having backup process where needed. i.e. Scoring OTP, automated backup process
- Chaos engineering
- Help to solve cross teams/departments technical issues
- System Stability - Automate fallback approach (Feature toggle)
- System Monitor Alert ( API/DB…)
- System Capacity plannings ( From server to DB/Infrastructure)
- Site stability statistic and action: Traffic count, Response time, Capacity, Error rate and specific error, Fallback strategy (Hystric/Fallfast proper way)
What You’ll Do:
- Work in with functional teams so that you can provide system observation, stability and visualization
- You are responsible for keeping our infrastructure humming as new releases and maintenance updates are rolled out
- You will help organize, secure, and automate existing infrastructure and deployments
- You will work closely with developers to provide feedback and drive operational improvements within our products and operations infrastructure
- You will be responsible for ensuring that our platform is stable and balanced
- Maintain high site up time, while embracing rapid change and growth
- Scale infrastructure to meet increasing demand and evolving technology
- Help the dev teams working on our code-bases realize zero downtime deployments
- Develop and improve operational practices and procedures
- You will coordinate and participate in on-call rotations
Required Skills and Experience
Need-to-have areas
- Good Knowledge about technologies or related technologies of categories below:
- At least 05 years experiencing with Linux server based technology
- Excellent English communication and documentation skills
- Any certification in related fields (i.e.: k8s admin, Azure) or equivalent experience or certificate is a plus
- Strong bash shell scripts.
- Experiences with CI/CD tools: Jenkins, Gitlab
- K8s Administrator at production grade
- Infrastructure as Code: Azure Resource Manager, Terraform
- System configuration tools: Ansible, Chef
- Container & Container Orchestration: Kubernetes, Docker
- Monitoring & Logging: Prometheus, Grafana, Elastic Search, Splunk
- Middleware & Cache: Kafka, RapidMQ, RedisCache
Need-to-have areas:
- Ability to work independently and under pressure
- Independent problem-solving, self-direction
- Ability to concentrate and pay close attention to detail
- Friendly and teamwork
- Accountability on job, giving feedback.
- Ethics and integrity.
- Welcome challenge and willing to learn, apply new technologies
- You are lazy and would love to automate anything
- Professional English communication in both verbal and writing
Why Candidate should apply this position
Understanding candidates’ expectations of the ideal workplace, we always put the people element at the top of our priorities.
Compensation programs and employees’ development opportunities for employees are always significantly invested in, bringing you:
Primary benefits:
- 13thmonth Salary and performance-based KPI Bonus
- 15+ Annual Leaves
- Full Social Insurance, 24/7 Accidental Insurance, Annual Medical Check- up
- Team Building and CSR activities: Year-end Party, New-year Party, Company trip, Charity activities, Blood donation
- Learning workshops: Udemy E-learning, English courses, Senior management development training programs
- Our culture fosters your career development through:
- Strategy
- Thinking Big: We focus on creating meaningful and sustainable opportunities for the company.
- Customer Obsession: We try to understand future customer needs, trends, and the impact of digital transformation.
- Implementation
- Digital Savviness: We seek to understand the technological aspects of the business to enhance efficiency and customer experience.
- Entrepreneurship: We communicate the company’s strategy and purpose to others, inspire and lead them effectively.
- Risk in mind: We consider risks and their possible effects before making decisions.
- Operational Excellence: We stay focused on the goal, disciplined in tightening tasks to the end.
- Building the organization
- People Centricity: We sustain focus on the development of team members through meaningful development plans and learning opportunities.
- Integrity: We always stay focused on the goal you need to achieve, refrain from getting into personal conflicts.
Prefer if candidate have
Nice-to-have areas
- Up to date with new technology trend (DevOps, cloud computing, big data)
- Intermediate Reporting skills
- Good documentation skills
- Ability to do join project as a worker
- Experiences with python, Java is a plus
- Participated with full SDLC
- Experiences with Microservices design pattern
Nice-to-have areas
- Strong service mentality, “can do” attitudes, strong drive to succeed
- Ability to work in a dynamic environment and provide recommendation to improve operation.
Reporting to
Team Leader
Interviewing Process
2 rounds with PIC
Site Reliability Engineer #PFS860267