Sunday, November 26, 2023

COCOMO II - Software Cost - Effort Estimation

 


https://boehmcsse.org/tools/cocomo-ii/


COnstructive COst MOdel II (COCOMO® II) is a model that allows one to estimate the cost, effort, and schedule when planning a new software development activity.


 It consists of three submodels, each one offering increased fidelity the further along one is in the project planning and design process. Listed in increasing fidelity, these submodels are called:


Applications Composition model

Early Design model

Post-architecture model


COCOMO II is useful for:


Making investment or other financial decisions involving a software development effort

Setting project budgets and schedules as a basis for planning and control

Deciding on or negotiating tradeoffs among software cost, schedule, functionality, performance or quality factors

Making software cost and schedule risk management decisions


COCOMO III is a project underway to update COCOMO II.


Boehm, Barry W.; Abts, Chris; Brown, A. Winsor; ;Chulani, Sunita; Clark, Bradford K.; Horowitz, Ellis; Madachy, Ray; Reifer, Donald; Steece, Burt; Software Cost Estimation with COCOMO II, Prentice Hall, 2000.



https://www.coursesidekick.com/management/349638



Software Project Management - LinkedIn Articles

 

https://www.linkedin.com/pulse/topics/it-services-s57547/software-project-management-s2157/

All collaborative articles  IT Services  Software Project Management


Software Project Management

What are the most effective ways to track project budgets and costs remotely?

What are the most effective ways to track project budgets and costs remotely?

  

143 contributions

 1 day ago

Learn some of the most effective ways to track project budgets and costs remotely for your software project, using various tools, methods, and best practices.


How can you use velocity as a project control metric?

How can you use velocity as a project control metric?

 

45 contributions

 10 hours ago

Learn how to use velocity, a measure of work completed, to control your software project by tracking progress, forecasting delivery, and adapting to changes.


How can you motivate team members to deliver high-quality work on a small budget?

How can you motivate team members to deliver high-quality work on a small budget?

  

37 contributions

 2 hours ago

Learn how to motivate your software team members to deliver high-quality work on a small budget with these six tips.


How can you ensure Scrum adapts to changing circumstances?

How can you ensure Scrum adapts to changing circumstances?

  

21 contributions

 18 minutes ago

Learn how to use the Scrum values, artifacts, events, roles, and experiments to ensure that your software team adapts to changing circumstances and delivers value.


How can you encourage team members to follow project documentation standards and frameworks?

How can you encourage team members to follow project documentation standards and frameworks?

 

17 contributions

 1 day ago

Learn how to explain the benefits, choose the right tools, involve the team, and monitor and review the documentation standards and frameworks for your software…


How can you design a project for optimal speed and time-to-market?

How can you design a project for optimal speed and time-to-market?

 

54 contributions

 1 day ago

Learn how to plan, communicate, iterate, and improve your software project for optimal speed and time-to-market with these six steps.


How can you mentor effectively in software project management without direct supervision?

How can you mentor effectively in software project management without direct supervision?

  

32 contributions

 4 days ago

Learn how to mentor effectively in software project management without direct supervision. Discover tips on setting goals, using tools, adapting styles, providing…


What is the best way to ensure delegated tasks stay within budget?

What is the best way to ensure delegated tasks stay within budget?

  

33 contributions

 1 day ago

Learn how to set clear expectations, delegate to the right person, empower your team members, and review and evaluate the results of your delegated tasks within…


How can you improve your coordination skills in an agile environment?

How can you improve your coordination skills in an agile environment?

  

6 contributions

 2 days ago

Learn how to plan, organize, communicate, and facilitate the work of multiple stakeholders in agile software projects. Improve your coordination skills with these…


How can you optimize project documentation for reuse and repurposing?

How can you optimize project documentation for reuse and repurposing?

 

23 contributions

 1 day ago

Learn how to use templates, modularization, automation, collaboration, and review to optimize your project documentation for reuse and repurposing.


How can you interpret budget variances?

How can you interpret budget variances?

  

5 contributions

 3 days ago

Learn how to calculate, interpret, and manage budget variances in software projects using budget variance analysis, a key technique in project cost management.


How can you keep your team focused when using agile methodologies?

How can you keep your team focused when using agile methodologies?

 

27 contributions

 5 days ago

Learn how to maintain your team's focus when using agile methodologies for software project management. Discover tips and best practices for communication…


How can you maintain a professional demeanor when resolving conflicts in a software project team?

How can you maintain a professional demeanor when resolving conflicts in a software project team?

 

37 contributions

 4 days ago

Learn how to maintain a professional demeanor when resolving conflicts in a software project team, and avoid some common pitfalls that can escalate or prolong the…


How can software project team members take ownership of their own training and development?

How can software project team members take ownership of their own training and development?

 

45 contributions

 6 hours ago

Learn how to take ownership of your own training and development as a software project team member. Discover how to assess, plan, apply, and review your learning.


What are the best ways to manage resources during project emergencies?

What are the best ways to manage resources during project emergencies?


29 contributions

 4 days ago

Learn some best practices for resource management during project emergencies, such as communication, planning, monitoring, learning, and celebrating.


What are the most effective ways to integrate Six Sigma into software project management?

What are the most effective ways to integrate Six Sigma into software project management?

  

6 contributions

 4 days ago

Learn how to apply Six Sigma principles and tools to your software project management, from planning to testing. Improve your project quality and efficiency with…


What are the most common software project documentation challenges?

What are the most common software project documentation challenges?

  

4 contributions

 11 hours ago

Learn about the most common software project documentation challenges and how to overcome them with tips and tricks for clarity, balance, consistency, motivation…


How can project documentation standards help with knowledge management?

How can project documentation standards help with knowledge management?

  

14 contributions

 2 days ago

Learn how project documentation standards can help you capture, share, and use the knowledge generated by your software project. Find out how to implement them and…


How can you effectively document and track project requirements throughout the project lifecycle?

How can you effectively document and track project requirements throughout the project lifecycle?

  

3 contributions

 18 hours ago

Learn best practices and tools for documenting and tracking project requirements throughout the project lifecycle, and deliver successful software projects.


How can you use project scope to prevent conflicts?

How can you use project scope to prevent conflicts?


9 contributions

 5 days ago

Learn how to use project scope to prevent conflicts in software project management. Project scope is the definition of what your project will deliver and how.


How can you design a project for maximum flexibility?

How can you design a project for maximum flexibility?


260 contributions

 2 days ago

Learn how to plan, execute, and deliver a flexible software project using agile methods, modular architecture, and continuous delivery.


How can you negotiate software contracts that meet everyone's needs?

How can you negotiate software contracts that meet everyone's needs?


106 contributions

 5 days ago

Learn how to negotiate software contracts that meet the needs and interests of different stakeholders, avoid conflicts and disputes, and achieve win-win outcomes.


How can you use a flexible work environment to improve productivity in Software Project Management?

How can you use a flexible work environment to improve productivity in Software Project Management?


10 contributions

 5 days ago

Learn how a flexible work environment can improve productivity, quality, and satisfaction in software project management. Find out the challenges, best practices…


How can you recognize and reward employees in software project management?

How can you recognize and reward employees in software project management?

 

63 contributions

 1 day ago

Learn how to boost motivation, productivity, retention, and innovation by recognizing and rewarding your employees in software project management. Discover how to…


What is the best way to use project documentation for testing and integration?

What is the best way to use project documentation for testing and integration?

 

4 contributions

 6 days ago

Learn some best practices and tips to leverage your project documentation for testing and integration, two crucial phases of software development.


How can you monitor and evaluate software projects with a customer-centric approach?

How can you monitor and evaluate software projects with a customer-centric approach?

  

16 contributions

 2 days ago

Learn how to use customer feedback and satisfaction as the main criteria for monitoring and evaluating your software projects, using practical tools and methods.


What are the best ways to mitigate risks associated with software project vendor performance?

What are the best ways to mitigate risks associated with software project vendor performance?


37 contributions

 6 days ago

Learn best practices to reduce risks associated with software project vendor performance, such as defining requirements, communicating regularly, and evaluating…


How can you use project planning metrics to identify dependencies?

How can you use project planning metrics to identify dependencies?

 

8 contributions

 6 days ago

Learn how to use project planning metrics, such as schedule variance, cost variance, schedule performance index, and cost performance index, to identify and manage…


What are the key responsibilities of a scrum master in an agile project?

What are the key responsibilities of a scrum master in an agile project?

  

60 contributions

 5 days ago

Learn about the main responsibilities of a scrum master in an agile project, and how they facilitate, support, and empower the team to deliver value to the customer.


What are the most effective ways to create user manuals?

What are the most effective ways to create user manuals?

 

27 contributions

 5 days ago

Learn some of the most effective ways to create user manuals that meet the needs and expectations of your users and stakeholders for your software projects.


What negotiation tactics can you use with unresponsive stakeholders?

What negotiation tactics can you use with unresponsive stakeholders?

  

3 contributions

 5 days ago

Learn negotiation tactics to deal with unresponsive stakeholders in software projects, such as communication, escalation, incentives, compromise, and involvement.


How can you build trust and rapport during conflict resolution in software project management?

How can you build trust and rapport during conflict resolution in software project management?

  

12 contributions

 1 day ago

Learn how to build trust and rapport with your stakeholders while resolving conflicts in software project management. Discover five skills to help you achieve…


How can you support team members who struggle with decision making?

How can you support team members who struggle with decision making?


35 contributions

 3 days ago

Learn how to support team members who struggle with decision making in software projects. This article provides a clear framework and practical tips to help them…


How can you discuss your experience with Waterfall in a Software Project Management interview?

How can you discuss your experience with Waterfall in a Software Project Management interview?

 

9 contributions

 3 days ago

Learn how to discuss your experience with Waterfall, a traditional software project management methodology, in an interview. Compare and contrast with agile…


How can you negotiate a feasible delivery date with a pushy stakeholder?

How can you negotiate a feasible delivery date with a pushy stakeholder?

  

5 contributions

 5 days ago

Learn how to communicate effectively, manage expectations, and use agile methods to negotiate a feasible delivery date with a pushy stakeholder in software project…


How can you ensure project teams have access to necessary resources?

How can you ensure project teams have access to necessary resources?


1 contribution

 3 weeks ago

Learn how to coordinate resources for software projects and avoid issues. This article covers tools and techniques for resource management.


What are the essential features of an earned value management tool?

What are the essential features of an earned value management tool?

 

8 contributions

 2 weeks ago

Learn what are the essential features of an EVM tool for software project management and how they can help you measure and improve your project performance.


How can you recover your project documentation in case of data loss or corruption?

How can you recover your project documentation in case of data loss or corruption?

 

2 contributions

 2 weeks ago

Learn how to recover your project documentation in case of data loss or corruption, and how to prevent it from happening again. Find out the best methods and tools…


What are the most effective strategies for developing your skills as a scrum master?

What are the most effective strategies for developing your skills as a scrum master?

  

4 contributions

 6 days ago

Learn the most effective strategies for developing your skills as a scrum master and becoming more effective in software project management.


What is the best way to prioritize project features on a tight budget?

What is the best way to prioritize project features on a tight budget?

 

45 contributions

 4 days ago

Learn the best practices and methods to prioritize software project features on a tight budget, and deliver the most value to your clients and users.


What is the velocity metric and how can you use it to track your team's productivity?

What is the velocity metric and how can you use it to track your team's productivity?

 

7 contributions

 16 hours ago

Learn what velocity is, how to calculate it, and how to use it to measure and improve your software team's productivity in agile software development.


What are the best decision-making strategies for technical debt in agile projects?

What are the best decision-making strategies for technical debt in agile projects?

  

54 contributions

 5 days ago

Learn the best decision-making strategies for technical debt in agile projects, based on agile principles and practices. Discover how to measure, prioritize…


How can you make project tracking more agile?

How can you make project tracking more agile?

  

49 contributions

 5 days ago

Learn how to improve your project tracking practices in software project management by using a visual tool, tracking outcomes, using feedback loops, and…


How can you accurately estimate tasks despite uncertainty?

How can you accurately estimate tasks despite uncertainty?

 

71 contributions

 2 weeks ago

Learn practical techniques and best practices to improve your task estimation skills and cope with uncertainty in software project management.


How can you design software projects that scale and grow?

How can you design software projects that scale and grow?

  

33 contributions

 2 days ago

Learn how to plan, design, and manage software projects that can scale and grow with your business and user needs. Discover best practices and techniques for…


What are the best ways to identify and quickly resolve software defects in an agile development environment?

What are the best ways to identify and quickly resolve software defects in an agile development environment?

  

40 contributions

 1 day ago

Learn how to use test automation, code reviews, pair programming, feedback loops, and root cause analysis to find and fix software defects in agile projects.


How can you identify risks in an agile project?

How can you identify risks in an agile project?

 

27 contributions

 2 days ago

Learn how to use agile practices and tools to identify risks in your project, and how to document, track, communicate, and collaborate on them.


How can networking help you improve your Software Project Management skills?

How can networking help you improve your Software Project Management skills?

  

74 contributions

 6 hours ago

Learn how networking can help you enhance your software project management skills by expanding your knowledge, finding mentors, getting feedback, and creating…


What are the best ways to demonstrate your adaptability in Software Project Management?

What are the best ways to demonstrate your adaptability in Software Project Management?

  

166 contributions

 2 days ago

Learn six best practices to demonstrate your adaptability in software project management, a key skill for delivering successful projects and getting a promotion.


How can you use change management to keep customers happy?

How can you use change management to keep customers happy?

 

52 contributions

 6 days ago

Learn how to use change management principles and practices to deliver value, quality, and satisfaction to your customers throughout the software project lifecycle.


How do you connect your software project goals to career development?

How do you connect your software project goals to career development?

  

7 contributions

 5 days ago

Learn how to align your software project goals with your career development goals and action plans in six steps: assess, set, track, celebrate, seek, and review.


How can you use project tracking tools to streamline project maintenance?

How can you use project tracking tools to streamline project maintenance?

  

13 contributions

 1 month ago

Learn how to use project tracking tools to monitor and manage your software projects. Find out how they can automate tasks, improve communication, and provide…


How do you set realistic and achievable goals for your software project management improvement plan?

How do you set realistic and achievable goals for your software project management improvement plan?

 

7 contributions

 5 days ago

Learn how to set realistic and achievable goals for your software project management improvement plan with these steps and tips. Improve your skills and performance…


How do you show the impact of your software team?

How do you show the impact of your software team?

 

18 contributions

 3 months ago

Learn how to measure and communicate the value and achievements of your software team with best practices and tools.


What are the trade-offs between budget and quality in software project management?

What are the trade-offs between budget and quality in software project management?

 

6 contributions

 3 days ago

Learn about the common challenges and strategies for balancing budget and quality in software project management. Discover how to optimize the value and quality of…


What techniques can you use to manage project scope and requirements with many stakeholders?

What techniques can you use to manage project scope and requirements with many stakeholders?

  

64 contributions

 2 days ago

Learn some techniques to define, communicate, and control the scope and requirements of your software project with many stakeholders.


What are the top qualities of a successful software project manager?

What are the top qualities of a successful software project manager?

  

19 contributions

 2 days ago

Learn about the six essential skills that can help you become a more effective and efficient software project manager, such as communication, leadership, technical,…


How do you align agile documentation and reporting with the project vision and goals?

How do you align agile documentation and reporting with the project vision and goals?

 

19 contributions

 1 day ago

Learn how to align agile documentation and reporting with the project vision and goals, and how to provide value to your stakeholders and team.


How can you keep software developers engaged in projects?

How can you keep software developers engaged in projects?

  

25 contributions

 2 weeks ago

Learn how to foster a positive and productive environment for your software team, while also ensuring that the project goals are met. Discover six tips to keep…


How do you combine stakeholder management with other software processes?

How do you combine stakeholder management with other software processes?

  

18 contributions

 3 weeks ago

Learn how to identify, engage, and communicate with your software project stakeholders, and align stakeholder management with software processes.


How can you avoid communication barriers in software projects?

How can you avoid communication barriers in software projects?

  

10 contributions

 1 month ago

Learn how to communicate more effectively and overcome common communication obstacles in software projects with these six tips.


How do you manage software projects differently?

How do you manage software projects differently?

  

23 contributions

 6 days ago

Learn six key principles and practices to manage software projects differently and deliver value to customers, stakeholders, and teams.


What software project management skills will you need for the future?

What software project management skills will you need for the future?

  

15 contributions

 5 days ago

Learn what skills software project managers will need for the future, such as agile mindset, technical competence, business acumen, leadership skills, communication…


What are some best practices for writing and maintaining software user manuals?

What are some best practices for writing and maintaining software user manuals?


1 contribution

 4 days ago

Learn some best practices for creating and managing software user manuals that can help you deliver clear, consistent, and helpful information to your end-users.


How can you identify budget misalignments in a software project?

How can you identify budget misalignments in a software project?

 

103 contributions

 1 week ago

Learn how to use tools and techniques to monitor and control your software project budget, and identify and fix any deviations from the planned baseline.


How can you ensure that your team learns from mistakes?

How can you ensure that your team learns from mistakes?


133 contributions

 1 week ago

Learn how to create a blameless environment, conduct effective retrospectives, implement corrective measures, and celebrate successes to learn from mistakes in…


What are the most common issues that arise during budget reviews for Software Project Management?

What are the most common issues that arise during budget reviews for Software Project Management?

  

53 contributions

 1 week ago

Learn about the most common issues that arise during budget reviews for software project management, and how to address them effectively.


How can you address negative attitudes towards diversity and inclusion in project team management?

How can you address negative attitudes towards diversity and inclusion in project team management?

  

89 contributions

 1 week ago

Learn how to evaluate, educate, set, monitor, reward, and correct diversity and inclusion in your software project team. Boost your project creativity, innovation…


How can you measure employee engagement in Agile Software Development?

How can you measure employee engagement in Agile Software Development?

  

49 contributions

 3 weeks ago

Learn about the methods and tools that can help you assess and enhance employee engagement in agile software development, and the benefits it can bring to your…


What are the best ways to negotiate with upper management for project resources?

What are the best ways to negotiate with upper management for project resources?

  

45 contributions

 3 weeks ago

Learn how to prepare and conduct successful negotiations for project resources with upper management. Discover negotiation strategies and tactics to secure the…


How can software project managers maintain employee engagement during conflicts?

How can software project managers maintain employee engagement during conflicts?

 

53 contributions

 2 weeks ago

Learn how to deal with conflicts in software project management and maintain employee engagement with these tips and strategies.


How can you build a strong professional network when working remotely in software project management?

How can you build a strong professional network when working remotely in software project management?

  

41 contributions

 1 week ago

Learn how to build a strong professional network online as a remote software project manager. Discover how to use social media, virtual events, online forums…


How can Software Project Managers create a culture of accountability?

How can Software Project Managers create a culture of accountability?

  

40 contributions

 1 week ago

Learn how to create a culture of accountability in your software projects, and why it matters for collaboration and success.


How can you design projects that encourage user participation?

How can you design projects that encourage user participation?

  

64 contributions

 1 month ago

Learn how to design software projects that encourage user participation from the start. Discover the benefits and methods of user involvement, motivation, and…


How can you maintain project coordination quality assurance on large-scale software projects?

How can you maintain project coordination quality assurance on large-scale software projects?

  

58 contributions

 2 weeks ago

Learn how to use tips and tools to maintain project coordination quality assurance (PCQA) on large-scale software projects and avoid delays, errors, conflicts, and…


How can you describe your Agile methodology experience in a Software Project Management interview?

How can you describe your Agile methodology experience in a Software Project Management interview?

  

35 contributions

 2 weeks ago

Learn how to explain the basics, give examples, show skills, and be honest about your Agile experience in a software project management interview.


What are the most expensive mistakes to avoid when budgeting for software projects?

What are the most expensive mistakes to avoid when budgeting for software projects?

  

71 contributions

 3 weeks ago

Learn six tips and best practices to avoid the most expensive mistakes that can ruin your budget and your software project, such as lack of clarity, poor…


What are the best practices for managing project risks?

What are the best practices for managing project risks?

  

71 contributions

 2 weeks ago

Learn the best practices for managing project risks in software project management, such as identifying, analyzing, planning, monitoring, and communicating risks.


How can you communicate the benefits of agile project management to resistant stakeholders?

How can you communicate the benefits of agile project management to resistant stakeholders?

  

66 contributions

 3 weeks ago

Learn how to convince your stakeholders that agile project management can help them achieve their goals and solve their problems.


What are the best strategies for designing projects that optimize customer satisfaction and loyalty?

What are the best strategies for designing projects that optimize customer satisfaction and loyalty?

  

29 contributions

 2 weeks ago

Learn the best strategies for designing software projects that meet or exceed your customers' expectations and keep them coming back for more.


What are the best ways to design projects for cloud computing?

What are the best ways to design projects for cloud computing?

  

34 contributions

 1 week ago

Learn the best ways to design projects for cloud computing, based on common principles and practices. Choose the right cloud models, architecture, migration…


What are the best ways to address project coordination challenges early on?

What are the best ways to address project coordination challenges early on?

  

22 contributions

 3 weeks ago

Learn some best practices to address project coordination challenges early on in software project management, such as defining the scope, establishing the…


What are the top considerations for a freelancer's project management plan?

What are the top considerations for a freelancer's project management plan?

  

24 contributions

 3 weeks ago

Learn the top considerations for a freelancer's project management plan in software projects, and how to define, estimate, manage, communicate, control, and…


How can you ensure all team members feel heard during conflict resolution?

How can you ensure all team members feel heard during conflict resolution?

  

51 contributions

 2 months ago

Learn how to listen actively, acknowledge emotions, seek feedback, involve everyone, and follow up during conflict resolution as a software project manager.


What makes project tracking tools essential for software project management?

What makes project tracking tools essential for software project management?

  

36 contributions

 1 month ago

Learn what makes project tracking tools essential for software project management and how to choose and use them effectively. Discover the benefits, features…


What are the different types of test cases used in software project management?

What are the different types of test cases used in software project management?

 

28 contributions

 1 month ago

Learn about the different types of test cases for software project management, and how they can help you verify the quality and functionality of your software.


What is the difference between manual and automated testing?

What is the difference between manual and automated testing?

 

33 contributions

 3 weeks ago

Learn the difference between manual and automated testing, and how to choose the best method for your software project. Discover the benefits and challenges of each…


How can you create user-friendly automated documentation tools for all stakeholders?

How can you create user-friendly automated documentation tools for all stakeholders?


10 contributions

 1 week ago

Learn how to create documentation tools that are easy to use, maintain, and access for all stakeholders of your software projects. Discover tips and techniques for…


How do you balance project metrics with user satisfaction?

How do you balance project metrics with user satisfaction?

  

48 contributions

 2 months ago

Learn how to choose, collect, analyze, communicate, and act on the right project metrics and user satisfaction metrics for your software projects.


How do you estimate technical debt and rework costs?

How do you estimate technical debt and rework costs?

  

48 contributions

 1 month ago

Learn how to measure and manage technical debt and rework costs in your software projects using methods and tools for software project estimation.


How can you structure your project documentation for maximum impact?

How can you structure your project documentation for maximum impact?


27 contributions

 2 weeks ago

Learn how to create and organize project documentation that is clear, useful, and impactful for your software project management. Follow these tips and best…


How can you help your team manage risk?

How can you help your team manage risk?


10 contributions

 1 week ago

Learn how to involve your team in risk management, from defining risk categories to communicating and monitoring risks, in this article on software project…


What are the best practices for using active listening in software project management?

What are the best practices for using active listening in software project management?

 

10 contributions

 2 weeks ago

Learn how active listening can enhance your software project management performance, especially in conflict management, trust building, and collaboration…


How can you make decisions with limited information in Software Project Management?

How can you make decisions with limited information in Software Project Management?

  

29 contributions

 1 month ago

Learn how to make decisions with limited information in software project management using tools and techniques such as problem statements, decision trees, SWOT…


How can you motivate developers to work together on a project?

How can you motivate developers to work together on a project?

  

8 contributions

 2 weeks ago

Learn how to foster a positive and productive team culture by motivating developers to work together on a software project.


How do you improve software quality in every project phase?

How do you improve software quality in every project phase?

 

43 contributions

 1 week ago

Learn how to apply proven practices and techniques to improve software quality in the planning, design, development, testing, and deployment phases of the software…


How can automated testing tools monitor software project progress?

How can automated testing tools monitor software project progress?

  

12 contributions

 1 week ago

Learn how automated testing tools can provide feedback on quality, performance, and functionality of software products, and how to use them to monitor project…


What are the best strategies for ensuring stakeholder satisfaction during project implementation?

What are the best strategies for ensuring stakeholder satisfaction during project implementation?

  

27 contributions

 2 weeks ago

Learn best strategies to keep your stakeholders satisfied and supportive during your software project implementation phase.


How can you explain project budgeting and resource allocation in an interview?

How can you explain project budgeting and resource allocation in an interview?

 

7 contributions

 2 weeks ago

Learn how to explain project budgeting and resource allocation in a software project manager interview, and show your skills and experience with examples and best…


How can you optimize your software project portfolio using the mean-variance model?

How can you optimize your software project portfolio using the mean-variance model?


5 contributions

 2 weeks ago

Learn how to use the mean-variance model, a portfolio optimization method that evaluates the trade-off between the expected return and the variance of your software…




Tuesday, November 21, 2023

Qualitative Thinking and Research in Data Science

 

Why the Data Revolution Needs Qualitative Thinking

by Anissa Tanweer, Emily Kalah Gade, P.M. Krafft, and Sarah Dreier

Published on

Jul 30, 2021

Harvard Data Science Review

https://hdsr.mitpress.mit.edu/pub/u9s6f22y/release/4


In this article, we focus on a set of concepts that are intrinsically informed by particular epistemological and ontological positions common in qualitative social sciences—positions that seek to understand the contingently and subjectively constructed nature of the social world. We refer to these concepts as ‘sensibilities’ because we intend them to intervene on methodology in a sensitizing rather than prescriptive way. The three sensibilities we discuss, have certain kinds of methodological practices, and they can be coupled with multiple modes of data collection and analysis.





Sensibility


Interpretivism


Working definition

An epistemological approach probing the multiple and contingent ways that meaning is ascribed to objects, actions, and situations.


Example of related methods

Trace ethnography (Geiger & Ribes, 2011; Geiger & Halfaker, 2017)


Sensibility


Abductive reasoning

Working definition

A mode of inference that updates and builds upon preexisting assumptions based on new observations in order to generate a novel explanation for a phenomenon.

Example of related methods

Iterations of open coding, theoretical coding, and selective coding (Thornberg & Charmaz, 2013)

Sensibility


Reflexivity

Working definition

A process by which researchers systematically reflect upon their own positions relative to their object, context, and method of inquiry.

Example of related methods

Brain dumps, situational mapping, and toolkit critiques (Markham, 2017)


Abduction

Abduction is often described as “inference to the best explanation” (Douven, 2011). 

Abductive reasoning updates and builds upon preexisting assumptions (in other words, theories) based on new observations in order to generate a novel explanation for a phenomenon. As such, it demarks “a creative outcome which engenders a new idea,"

When using abductive reasoning, qualitative researchers have developed ways of addressing the relationships between prior assumptions, new observations, and newly derived explanations. This can be incorporated in data science. 


The labeling of data in qualitative methods (what qualitative researchers would instead call ‘coding’) is not a matter of mere assumption, but rather a systematic part of the theory-building process.



LinkedIn AI article

What are the most common reasoning frameworks for data science?

https://www.linkedin.com/advice/0/what-most-common-reasoning-frameworks-data-science-41fif

I made a contribution to the above article.

The above article in is decision making series

https://www.linkedin.com/pulse/topics/soft-skills-s2976/decision-making-s2506/

Jan 11, 2023

AI concepts for beginners - Exploring abductive reasoning in AI

https://indiaai.gov.in/article/exploring-abductive-reasoning-in-ai

Sunday, November 19, 2023

Applications of Data Mining Techniques in Stock Selection


Stock Portfolio Selection using Data Mining Approach
November 2013, IOSR Journal of Engineering 3(11):42-48
DOI:10.9790/3021-031114248


Bayesian Networks



Decision Trees


Fuzzy Sets

Neural Networks

Rough Sets



Fundamental Analysis of Stock Price Artificial Neural Network Model based on Rough Set Theory
Wei Wu, Jiuping Ju, World Journal of Modeling and Simulation, Vol. 2, (2006), No. 1, pp. 36-44
http://www.worldacademicunion.com/journal/1746-7233WJMS/WJMSvol2no1paper4.pdf

Article in Transactions in Rough Sets XVII March 2014
https://books.google.co.in/books?id=Mky7BQAAQBAJ

Rough Sets in Economics and Finance - Has a section on applications in portfolio selection.        





Ud. 19.11.2023
Pub. 14.5.2015

Saturday, November 18, 2023

Machine Learning - Introduction



What is machine learning?

A learning machine, broadly defined, is any device whose actions are influenced by past experiences.
— Nils J. Nilsson



Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."[Mitchell, T. (1997). Machine Learning. McGraw Hill. p. 2.]



Machine Learning is using generic algorithms to tell you something interesting about your data without writing any code specific to the problem you are solving.

The below is a similar explanation. You need not write the program to do the work required.

Machine learning, as a type of artificial intelligence (AI), enables computers to learn without being explicitly programmed, and to improve their functions when exposed to new data. By analyzing patterns in this data, the machine learning algorithms are self-adjusting based on a set of design rules.

http://www.softvision.com/blog/what-is-machine-learning/


Machine learning was initially part of AI. But AI abandoned it. Machine learning (ML), reorganized and recognized as its own field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics, fuzzy logic, and probability theory.

Machine learning and Data mining

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).

In machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. 

https://en.wikipedia.org/wiki/Machine_learning

Statistics
Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field.


https://web.archive.org/web/20171018192328/https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt/?context=3

Articles in Medium by Adam Geitgey

1
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.w55suff6b

Machine Learning is using generic algorithms to tell you something interesting about your data without writing any code specific to the problem you are solving.


2

Deep Learning and Convolutional Neural Networks


4

A method invented in 2005 called Histogram of Oriented Gradients — or just HOG for short.
Algorithm - face landmark estimation.


5

Sequence-to-sequence learning.

Statistical machine translation systems perform much better than rule-based systems if you give them enough training data. Franz Josef Och improved on these ideas and used them to build Google Translate in the early 2000s. Machine Translation was finally available to the world.

A recurrent neural network (or RNN for short) is a slightly tweaked version of a neural network where the previous state of the neural network is one of the inputs to the next calculation. This means that previous calculations change the results of future calculations!


The idea of turning a face into a list of measurements is an example of an encoding. We are taking raw data (a picture of a face) and turning it into a list of measurements that represent it (the encoding).

6

The algorithm (roughly) described here to deal with variable-length audio is called Connectionist Temporal Classification or CTC. 


7

The new system is called Deep Convolutional Generative Adversarial Networks (or DCGANs for short).

How DCGANs work
To build a DCGAN, we create two deep neural networks. Then we make them fight against each other, endlessly attempting to out-do one another. In the process, they both become stronger.


---------------------

Machine Learning Explained
https://blog.dataiku.com/machine-learning-explained-algorithms-are-your-friend


Machine Learning MIT Course - Course Materials

http://www.ai.mit.edu/courses/6.867-f04/lectures.html


Machine Learning Cheatsheet - SAS


https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/



How To Become A Machine Learning Engineer: Learning Path
Aug 19, 2017
https://hackernoon.com/learning-path-for-machine-learning-engineer-a7d5dc9de4a4


Machine learning - Notes
http://www.holehouse.org/mlclass/


Updated 18.11.2023,  21 July 2021,  15 July 2018,   24 June 2018,  6 October 2017,  23 August 2017, 30 July 2016

Thursday, November 16, 2023

Harvard Data Science Review - Issues and Articles - Information

 

Summer 2019 - Vol. 1, Issue 1

https://hdsr.mitpress.mit.edu/volume1issue1


Data Science: An Artificial Ecosystem

Issue 1.1 / Summer 2019

by Xiao-Li Meng

Published on

Jul 02, 2019

https://hdsr.mitpress.mit.edu/pub/jhy4g6eg/release/9?readingCollection=72befc2a


Data Science - Case Studies

 


https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/10


https://towardsdatascience.com/case-study-applying-a-data-science-process-model-to-a-real-world-scenario-93ae57b682bf


https://www.cambridgespark.com/en-gb/case-studies/carrefour-case-study


Inventory Analysis Case Study Data files: - PWC

https://www.pwc.com/us/en/careers/university-relations/data-and-analytics-case-studies-files.html



data science case study  Google search results interesting

Data Analytics and Data Mining - Difference Explained

Data analytics can be classified into three categories:

Descriptive analytics: Describes the collected data or dataset with clear visualization and summary.

Predictive analytics: Predict the future behavior of interest. Provides scenario analysis.

Prescriptive analytics: Makes or suggests smart decisions based on the predictive results. Optimization of solution based on the results of predictive analytics.

The three steps or categories of data analytics have to be used to make a decision based on data. To make data analytics valid or effective within a company in many different decisions, the company needs to involve at least three different people with different skills:

Business experts: Some of them set the problem  objective and some provide the decision model that which is based on domain knowledge. The decision model  indicates the data to be collected, the processes from which the data will be collected and the period for which data needs to be collected.

Information technology experts: They design the database which is likely to be filled during transaction processing, and they also manage the database.

Data analysis experts: They understand data mining, statistical and OR techniques.


Data analytics as explained is objective-oriented process that aims to make smart decisions. The goal is set first and data is  analyzed to take the decision that helps in achieving the goal in efficient manner.

Data mining focuses on identifying undiscovered patterns and establishing hidden relationships embedded in the dataset.  Data mining is a part of predictive analytics method.



Ud. 16.11.2023
First published  15.5.2015

Hui Lin and Ming Li - Practitioner’s Guide to Data Science - Book Information - Notes

Contents

List of Figures ix

List of Tables xiii

Preface xv

About the Authors xxiii

1 Introduction 1

1.1 A Brief History of Data Science . . . . . . . . . . 1

1.2 Data Science Role and Skill Tracks . . . . . . . . 5

1.2.1 Engineering . . . . . . . . . . . . . . . . . 7

1.2.2 Analysis . . . . . . . . . . . . . . . . . . . 8

1.2.3 Modeling/Inference . . . . . . . . . . . . . 10

1.3 What Kind of Questions Can Data Science Solve? 15

1.3.1 Prerequisites . . . . . . . . . . . . . . . . 15

1.3.2 Problem Type . . . . . . . . . . . . . . . 18

1.4 Structure of Data Science Team . . . . . . . . . 20

1.5 Data Science Roles . . . . . . . . . . . . . . . . . 24

2 Soft Skills for Data Scientists 31

2.1 Comparison between Statistician and Data Scientist . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2 Beyond Data and Analytics . . . . . . . . . . . . 33

2.3 Three Pillars of Knowledge . . . . . . . . . . . . 35

2.4 Data Science Project Cycle . . . . . . . . . . . . 36

2.4.1 Types of Data Science Projects . . . . . . 36

2.4.2 Problem Formulation and Project Planning

Stage . . . . . . . . . . . . . . . . . . . . 38

2.4.3 Project Modeling Stage . . . . . . . . . . 40

iii

iv Contents

2.4.4 Model Implementation and Post Production Stage . . . . . . . . . . . . . . . . . . 41

2.4.5 Project Cycle Summary . . . . . . . . . . 42

2.5 Common Mistakes in Data Science . . . . . . . . 43

2.5.1 Problem Formulation Stage . . . . . . . . 43

2.5.2 Project Planning Stage . . . . . . . . . . . 44

2.5.3 Project Modeling Stage . . . . . . . . . . 45

2.5.4 Model Implementation and Post Production Stage . . . . . . . . . . . . . . . . . . 46

2.5.5 Summary of Common Mistakes . . . . . . 47

3 Introduction to the Data 49

3.1 Customer Data for a Clothing Company . . . . . 49

3.2 Swine Disease Breakout Data . . . . . . . . . . . 51

3.3 MNIST Dataset . . . . . . . . . . . . . . . . . . 53

3.4 IMDB Dataset . . . . . . . . . . . . . . . . . . . 53

4 Big Data Cloud Platform 57

4.1 Power of Cluster of Computers . . . . . . . . . . 58

4.2 Evolution of Cluster Computing . . . . . . . . . 59

4.2.1 Hadoop . . . . . . . . . . . . . . . . . . . 59

4.2.2 Spark . . . . . . . . . . . . . . . . . . . . 60

4.3 Introduction of Cloud Environment . . . . . . . 60

4.3.1 Open Account and Create a Cluster . . . 61

4.3.2 R Notebook . . . . . . . . . . . . . . . . . 62

4.3.3 Markdown Cells . . . . . . . . . . . . . . 63

4.4 Leverage Spark Using R Notebook . . . . . . . . 64

4.5 Databases and SQL . . . . . . . . . . . . . . . . 71

4.5.1 History . . . . . . . . . . . . . . . . . . . 71

4.5.2 Database, Table and View . . . . . . . . . 72

4.5.3 Basic SQL Statement . . . . . . . . . . . 74

4.5.4 Advanced Topics in Database . . . . . . . 78

5 Data Pre-processing 79

5.1 Data Cleaning . . . . . . . . . . . . . . . . . . . 81

5.2 Missing Values . . . . . . . . . . . . . . . . . . . 84

5.2.1 Impute missing values with median/mode 85

5.2.2 K-nearest neighbors . . . . . . . . . . . . 86

Contents v

5.2.3 Bagging Tree . . . . . . . . . . . . . . . . 88

5.3 Centering and Scaling . . . . . . . . . . . . . . . 88

5.4 Resolve Skewness . . . . . . . . . . . . . . . . . 90

5.5 Resolve Outliers . . . . . . . . . . . . . . . . . . 93

5.6 Collinearity . . . . . . . . . . . . . . . . . . . . . 97

5.7 Sparse Variables . . . . . . . . . . . . . . . . . . 100

5.8 Re-encode Dummy Variables . . . . . . . . . . . 101

6 Data Wrangling 105

6.1 Summarize Data . . . . . . . . . . . . . . . . . . 107

6.1.1 dplyr package . . . . . . . . . . . . . . . . 107

6.1.2 apply(), lapply() and sapply() in base R . . 116

6.2 Tidy and Reshape Data . . . . . . . . . . . . . . 120

7 Model Tuning Strategy 125

7.1 Variance-Bias Trade-Off . . . . . . . . . . . . . . 126

7.2 Data Splitting and Resampling . . . . . . . . . . 134

7.2.1 Data Splitting . . . . . . . . . . . . . . . 135

7.2.2 Resampling . . . . . . . . . . . . . . . . . 145

8 Measuring Performance 151

8.1 Regression Model Performance . . . . . . . . . . 151

8.2 Classification Model Performance . . . . . . . . . 155

8.2.1 Confusion Matrix . . . . . . . . . . . . . . 157

8.2.2 Kappa Statistic . . . . . . . . . . . . . . . 159

8.2.3 ROC . . . . . . . . . . . . . . . . . . . . . 161

8.2.4 Gain and Lift Charts . . . . . . . . . . . . 163

9 Regression Models 167

9.1 Ordinary Least Square . . . . . . . . . . . . . . . 168

9.1.1 The Magic P-value . . . . . . . . . . . . . 173

9.1.2 Diagnostics for Linear Regression . . . . . 176

9.2 Principal Component Regression and Partial Least

Square . . . . . . . . . . . . . . . . . . . . . . . 180

10 Regularization Methods 189

10.1 Ridge Regression . . . . . . . . . . . . . . . . . . 190

10.2 LASSO . . . . . . . . . . . . . . . . . . . . . . . 195

vi Contents

10.3 Elastic Net . . . . . . . . . . . . . . . . . . . . . 199

10.4 Penalized Generalized Linear Model . . . . . . . 201

10.4.1 Introduction to glmnet package . . . . . . 201

10.4.2 Penalized logistic regression . . . . . . . . 206

11 Tree-Based Methods 217

11.1 Tree Basics . . . . . . . . . . . . . . . . . . . . . 217

11.2 Splitting Criteria . . . . . . . . . . . . . . . . . . 221

11.2.1 Gini impurity . . . . . . . . . . . . . . . . 222

11.2.2 Information Gain (IG) . . . . . . . . . . . 223

11.2.3 Information Gain Ratio (IGR) . . . . . . . 224

11.2.4 Sum of Squared Error (SSE) . . . . . . . . 226

11.3 Tree Pruning . . . . . . . . . . . . . . . . . . . . 228

11.4 Regression and Decision Tree Basic . . . . . . . . 232

11.4.1 Regression Tree . . . . . . . . . . . . . . . 232

11.4.2 Decision Tree . . . . . . . . . . . . . . . . 236

11.5 Bagging Tree . . . . . . . . . . . . . . . . . . . . 241

11.6 Random Forest . . . . . . . . . . . . . . . . . . . 245

11.7 Gradient Boosted Machine . . . . . . . . . . . . 249

11.7.1 Adaptive Boosting . . . . . . . . . . . . . 250

11.7.2 Stochastic Gradient Boosting . . . . . . . 252

12 Deep Learning 259

12.1 Feedforward Neural Network . . . . . . . . . . . 263

12.1.1 Logistic Regression as Neural Network . . 263

12.1.2 Stochastic Gradient Descent . . . . . . . . 265

12.1.3 Deep Neural Network . . . . . . . . . . . 266

12.1.4 Activation Function . . . . . . . . . . . . 270

12.1.5 Optimization . . . . . . . . . . . . . . . . 274

12.1.6 Deal with Overfitting . . . . . . . . . . . . 282

12.1.7 Image Recognition Using FFNN . . . . . . 284

12.2 Convolutional Neural Network . . . . . . . . . . 298

12.2.1 Convolution Layer . . . . . . . . . . . . . 299

12.2.2 Padding Layer . . . . . . . . . . . . . . . 303

12.2.3 Pooling Layer . . . . . . . . . . . . . . . . 304

12.2.4 Convolution Over Volume . . . . . . . . . 308

12.2.5 Image Recognition Using CNN . . . . . . 311

Contents vii

12.3 Recurrent Neural Network . . . . . . . . . . . . 317

12.3.1 RNN Model . . . . . . . . . . . . . . . . . 320

12.3.2 Long Short Term Memory . . . . . . . . . 323

12.3.3 Word Embedding . . . . . . . . . . . . . . 326

12.3.4 Sentiment Analysis Using RNN . . . . . . 328

Appendix 337

13 Handling Large Local Data 339

13.1 readr . . . . . . . . . . . . . . . . . . . . . . . . 339

13.2 data.table— enhanced data.frame . . . . . . . . . 347

14 R code for data simulation 359

14.1 Customer Data for Clothing Company . . . . . . 359

14.2 Swine Disease Breakout Data . . . . . . . . . . . 364

Bibliography 369

Index 37





1 Introduction 1

1.1 A Brief History of Data Science . . . . . . . . . . 1

1.2 Data Science Role and Skill Tracks . . . . . . . . 5

1.2.1 Engineering . . . . . . . . . . . . . . . . . 7

1.2.2 Analysis . . . . . . . . . . . . . . . . . . . 8

1.2.3 Modeling/Inference . . . . . . . . . . . . . 10

1.3 What Kind of Questions Can Data Science Solve? 15

1.3.1 Prerequisites . . . . . . . . . . . . . . . . 15

1.3.2 Problem Type . . . . . . . . . . . . . . . 18

1.4 Structure of Data Science Team . . . . . . . . . 20

1.5 Data Science Roles . . . . . . . . . . . . . . . . . 24

2 Soft Skills for Data Scientists 31

2.1 Comparison between Statistician and Data Scientist . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2 Beyond Data and Analytics . . . . . . . . . . . . 33

2.3 Three Pillars of Knowledge . . . . . . . . . . . . 35

2.4 Data Science Project Cycle . . . . . . . . . . . . 36

2.4.1 Types of Data Science Projects . . . . . . 36

2.4.2 Problem Formulation and Project Planning

Stage . . . . . . . . . . . . . . . . . . . . 38

2.4.3 Project Modeling Stage . . . . . . . . . . 40

iii

iv Contents

2.4.4 Model Implementation and Post Production Stage . . . . . . . . . . . . . . . . . . 41

2.4.5 Project Cycle Summary . . . . . . . . . . 42

2.5 Common Mistakes in Data Science . . . . . . . . 43

2.5.1 Problem Formulation Stage . . . . . . . . 43

2.5.2 Project Planning Stage . . . . . . . . . . . 44

2.5.3 Project Modeling Stage . . . . . . . . . . 45

2.5.4 Model Implementation and Post Production Stage . . . . . . . . . . . . . . . . . . 46

2.5.5 Summary of Common Mistakes . . . . . . 47

3 Introduction to the Data 49

3.1 Customer Data for a Clothing Company . . . . . 49

3.2 Swine Disease Breakout Data . . . . . . . . . . . 51

3.3 MNIST Dataset . . . . . . . . . . . . . . . . . . 53

3.4 IMDB Dataset . . . . . . . . . . . . . . . . . . . 53

4 Big Data Cloud Platform 57

4.1 Power of Cluster of Computers . . . . . . . . . . 58

4.2 Evolution of Cluster Computing . . . . . . . . . 59

4.2.1 Hadoop . . . . . . . . . . . . . . . . . . . 59

4.2.2 Spark . . . . . . . . . . . . . . . . . . . . 60

4.3 Introduction of Cloud Environment . . . . . . . 60

4.3.1 Open Account and Create a Cluster . . . 61

4.3.2 R Notebook . . . . . . . . . . . . . . . . . 62

4.3.3 Markdown Cells . . . . . . . . . . . . . . 63

4.4 Leverage Spark Using R Notebook . . . . . . . . 64

4.5 Databases and SQL . . . . . . . . . . . . . . . . 71

4.5.1 History . . . . . . . . . . . . . . . . . . . 71

4.5.2 Database, Table and View . . . . . . . . . 72

4.5.3 Basic SQL Statement . . . . . . . . . . . 74

4.5.4 Advanced Topics in Database . . . . . . . 78

5 Data Pre-processing 79

5.1 Data Cleaning . . . . . . . . . . . . . . . . . . . 81

5.2 Missing Values . . . . . . . . . . . . . . . . . . . 84

5.2.1 Impute missing values with median/mode 85

5.2.2 K-nearest neighbors . . . . . . . . . . . . 86

Contents v

5.2.3 Bagging Tree . . . . . . . . . . . . . . . . 88

5.3 Centering and Scaling . . . . . . . . . . . . . . . 88

5.4 Resolve Skewness . . . . . . . . . . . . . . . . . 90

5.5 Resolve Outliers . . . . . . . . . . . . . . . . . . 93

5.6 Collinearity . . . . . . . . . . . . . . . . . . . . . 97

5.7 Sparse Variables . . . . . . . . . . . . . . . . . . 100

5.8 Re-encode Dummy Variables . . . . . . . . . . . 101

6 Data Wrangling 105

6.1 Summarize Data . . . . . . . . . . . . . . . . . . 107

6.1.1 dplyr package . . . . . . . . . . . . . . . . 107

6.1.2 apply(), lapply() and sapply() in base R . . 116

6.2 Tidy and Reshape Data . . . . . . . . . . . . . . 120

7 Model Tuning Strategy 125

7.1 Variance-Bias Trade-Off . . . . . . . . . . . . . . 126

7.2 Data Splitting and Resampling . . . . . . . . . . 134

7.2.1 Data Splitting . . . . . . . . . . . . . . . 135

7.2.2 Resampling . . . . . . . . . . . . . . . . . 145

8 Measuring Performance 151

8.1 Regression Model Performance . . . . . . . . . . 151

8.2 Classification Model Performance . . . . . . . . . 155

8.2.1 Confusion Matrix . . . . . . . . . . . . . . 157

8.2.2 Kappa Statistic . . . . . . . . . . . . . . . 159

8.2.3 ROC . . . . . . . . . . . . . . . . . . . . . 161

8.2.4 Gain and Lift Charts . . . . . . . . . . . . 163

9 Regression Models 167

9.1 Ordinary Least Square . . . . . . . . . . . . . . . 168

9.1.1 The Magic P-value . . . . . . . . . . . . . 173

9.1.2 Diagnostics for Linear Regression . . . . . 176

9.2 Principal Component Regression and Partial Least

Square . . . . . . . . . . . . . . . . . . . . . . . 180

10 Regularization Methods 189

10.1 Ridge Regression . . . . . . . . . . . . . . . . . . 190

10.2 LASSO . . . . . . . . . . . . . . . . . . . . . . . 195

vi Contents

10.3 Elastic Net . . . . . . . . . . . . . . . . . . . . . 199

10.4 Penalized Generalized Linear Model . . . . . . . 201

10.4.1 Introduction to glmnet package . . . . . . 201

10.4.2 Penalized logistic regression . . . . . . . . 206

11 Tree-Based Methods 217

11.1 Tree Basics . . . . . . . . . . . . . . . . . . . . . 217

11.2 Splitting Criteria . . . . . . . . . . . . . . . . . . 221

11.2.1 Gini impurity . . . . . . . . . . . . . . . . 222

11.2.2 Information Gain (IG) . . . . . . . . . . . 223

11.2.3 Information Gain Ratio (IGR) . . . . . . . 224

11.2.4 Sum of Squared Error (SSE) . . . . . . . . 226

11.3 Tree Pruning . . . . . . . . . . . . . . . . . . . . 228

11.4 Regression and Decision Tree Basic . . . . . . . . 232

11.4.1 Regression Tree . . . . . . . . . . . . . . . 232

11.4.2 Decision Tree . . . . . . . . . . . . . . . . 236

11.5 Bagging Tree . . . . . . . . . . . . . . . . . . . . 241

11.6 Random Forest . . . . . . . . . . . . . . . . . . . 245

11.7 Gradient Boosted Machine . . . . . . . . . . . . 249

11.7.1 Adaptive Boosting . . . . . . . . . . . . . 250

11.7.2 Stochastic Gradient Boosting . . . . . . . 252

12 Deep Learning 259

12.1 Feedforward Neural Network . . . . . . . . . . . 263

12.1.1 Logistic Regression as Neural Network . . 263

12.1.2 Stochastic Gradient Descent . . . . . . . . 265

12.1.3 Deep Neural Network . . . . . . . . . . . 266

12.1.4 Activation Function . . . . . . . . . . . . 270

12.1.5 Optimization . . . . . . . . . . . . . . . . 274

12.1.6 Deal with Overfitting . . . . . . . . . . . . 282

12.1.7 Image Recognition Using FFNN . . . . . . 284

12.2 Convolutional Neural Network . . . . . . . . . . 298

12.2.1 Convolution Layer . . . . . . . . . . . . . 299

12.2.2 Padding Layer . . . . . . . . . . . . . . . 303

12.2.3 Pooling Layer . . . . . . . . . . . . . . . . 304

12.2.4 Convolution Over Volume . . . . . . . . . 308

12.2.5 Image Recognition Using CNN . . . . . . 311

Contents vii

12.3 Recurrent Neural Network . . . . . . . . . . . . 317

12.3.1 RNN Model . . . . . . . . . . . . . . . . . 320

12.3.2 Long Short Term Memory . . . . . . . . . 323

12.3.3 Word Embedding . . . . . . . . . . . . . . 326

12.3.4 Sentiment Analysis Using RNN . . . . . . 328

Appendix 337

13 Handling Large Local Data 339

13.1 readr . . . . . . . . . . . . . . . . . . . . . . . . 339

13.2 data.table— enhanced data.frame . . . . . . . . . 347

14 R code for data simulation 359

14.1 Customer Data for Clothing Company . . . . . . 359

14.2 Swine Disease Breakout Data . . . . . . . . . . . 364

What is Data Science - 2023 Multiple Explanations

 

BLOG@CACM

What is Data Science?

By Koby Mike, Orit Hazzan

Communications of the ACM, February 2023, Vol. 66 No. 2, Pages 12-13

https://cacm.acm.org/magazines/2023/2/268943-what-is-data-science/fulltext

References

1. Alvargonzález, D. Multidisciplinarity, interdisciplinarity, transdisciplinarity, and the sciences. International Studies in the Philosophy of Science, 25(4), 2011, 387–403. https://doi.org/10.1080/02698595.2011.623366


3. Chang, W. L., Grady, N., et al. Nist big data interoperability framework: Volume 1, big data definitions, 2015.


4. Conway, D. The data science venn diagram. Datist, 2010. http://www.dataists.com/2010/09/the-data-science-venn-diagram/


5. Davenport, T. H. and Patil, D. Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(5), 2010, 70–76.



7. Gray, J. EScience – A transformed scientific method. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt, 2007f




9. Irizarry, R. A. The role of academia in data science education, 2020.



11. Skiena, S. S. The data science design manual. Springer, 2017.


12. Taylor, D. Battle of the Data Science Venn Diagrams. KDnuggets. https://www.kdnuggets.com/battle-of-the-data-science-venn-diagrams.html/, 2016.


https://www.linkedin.com/pulse/data-science-process-methodology-pratibha-kumari-jha/


https://www.linkedin.com/pulse/methodology-data-science-andre-luiz-coelho-da-silva/?trk=organization_guest_main-feed-card_reshare_feed-article-content


Data Science vs Data Analytics: What Are the Similarities & Differences?

The main differences between data science and data analytics involve the methods and tools for working with data, as well as career paths, titles & salaries.

RICE UNIVERSITY

Department of Computer Science

https://csweb.rice.edu/academics/graduate-programs/online-mds/blog/data-science-vs-data-analytics


https://www.nature.com/articles/s41562-023-01562-4

Overview of Data Science - Oracle Cloud

Oracle Cloud Infrastructure (OCI) Data Science is a fully managed and serverless platform for data science teams to build, train, and manage machine learning models.

https://docs.oracle.com/en-us/iaas/data-science/using/overview.htm

March 29, 2023
Data Science in Finance
CMU Explanation - Detailed explanation

















Wednesday, November 15, 2023

Data Science - Books - Bibliography - Introduction

 

https://towardsdatascience.com/learn-on-towards-data-science-52245bc91451


Practitioner’s Guide to Data Science

By Hui Lin, Ming Li

1st Edition

First Published 2023

eBook Published 24 May 2023




Based on industry experience, this book outlines real-world scenarios and discusses pitfalls that data science practitioners should avoid. It also covers the big data cloud platform and the art of data science, such as soft skills. The authors use R as the primary tool and provide code for both R and Python. 

This book is for readers who want to explore possible career paths and eventually become data scientists. This book comprehensively introduces various data science fields, soft and programming skills in data science projects, and potential career paths. Traditional data-related practitioners such as statisticians, business analysts, and data analysts will find this book helpful in expanding their skills for future data science careers. Undergraduate and graduate students from analytics-related areas will find this book beneficial to learn real-world data science applications. Non-mathematical readers will appreciate the reproducibility of the companion R and python codes.


Key Features:

• It is hands-on. We provide the data and repeatable R and Python code in notebooks. Readers can repeat the analysis in the book using the data and code provided. We also suggest that readers modify the notebook to perform analyses with their data and problems, if possible. The best way to learn data science is to do it!



TABLE OF CONTENTS

Chapter 1|28 pages

Introduction

 

Chapter 2|18 pages

Soft Skills for Data Scientists

 

Chapter 3|8 pages

Introduction to the Data

 

Chapter 4|22 pages

Big Data Cloud Platform

 

Chapter 5|26 pages

Data Pre-processing

 

Chapter 6|22 pages

Data Wrangling

 

Chapter 7|26 pages

Model Tuning Strategy

 

Chapter 8|16 pages

Measuring Performance

 

Chapter 9|20 pages

Regression Models

 

Chapter 10|30 pages

Regularization Methods

 

Chapter 11|42 pages

Tree-Based Methods

 

Chapter 12|78 pages

Deep Learning

 


https://linhui.org/hui's_files/datascientist1#(20)

https://scholar.google.com/citations?user=PAArLQIAAAAJ&hl=en&oi=sra

https://scholar.google.com/citations?user=PAArLQIAAAAJ&hl=en

https://linhui.org/

https://github.com/happyrabbit

https://scientistcafe.com/

A Tour of Data Science: Learn R and Python in Parallel

Nailong Zhang

CRC Press, 11-Nov-2020 - Computers - 216 pages (C) 2021.

A Tour of Data Science: Learn R and Python in Parallel covers the fundamentals of data science, including programming, statistics, optimization, and machine learning in a single short book. It does not cover everything, but rather, teaches the key concepts and topics in Data Science. It also covers two of the most popular programming languages used in Data Science, R and Python, in one source.

Key features:

Allows you to learn R and Python in parallel

Cover statistics, programming, optimization and predictive modelling, and the popular data manipulation tools – data table and pandas

Provides a concise and accessible presentation

Includes machine learning algorithms implemented from scratch, linear regression, lasso, ridge, logistic regression, gradient boosting trees, etc.

Appealing to data scientists, statisticians, quantitative analysts, and others who want to learn programming with R and Python from a data science perspective.

A Hands-On Introduction to Data Science

Chirag Shah

Cambridge University Press, 02-Apr-2020 - Business & Economics - 424 pages


This book introduces the field of data science in a practical and accessible manner.

The foundational ideas and techniques of data science are provided  allowing students to easily develop a firm understanding of the subject. The material that will have continual relevance even after tools and technologies change. 

Using popular data science tools such as Python and R, the book offers many examples of real-life applications, with practice ranging from small to big data. A suite of online material for both instructors and students provides a strong supplement to the book, including datasets, chapter slides, solutions, sample exams and curriculum suggestions. This entry-level textbook is ideally suited to readers from a range of disciplines wishing to build a practical, working knowledge of data science.

https://books.google.co.in/books?id=rljPDwAAQBAJ

Data Science Job: How to become a Data Scientist

Przemek Chojecki, 31-Jan-2020 - Computers - 100 pages

Data Scientist is one of the hottest job on the market right now. Demand for data science is huge and will only grow, and it seems like it will grow much faster than the actual number of data scientists. So if you want to make a career change and become a data scientist, now is the time.

This book will guide you through the process. From my experience of working with multiple companies as a project manager, a data science consultant or a CTO, I was able to see the process of hiring data scientists and building data science teams. I know what’s important to land your first job as a data scientist, what skills you should acquire, what you should show during a job interview.

https://books.google.co.in/books?id=h0PZDwAAQBAJ


Foundations of Data Science

Avrim Blum, John Hopcroft, Ravindran Kannan

Cambridge University Press, 23-Jan-2020 - Computers - 432 pages

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. 

Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. 

Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.

https://books.google.co.in/books?id=koHCDwAAQBAJ


Data Science and Intelligent Applications: Proceedings of ICDSIA 2020

Ketan Kotecha, Vincenzo Piuri, Hetalkumar N. Shah, Rajan Patel

Springer Nature, 17-Jun-2020 - Technology & Engineering - 576 pages

This book includes selected papers from the International Conference on Data Science and Intelligent Applications (ICDSIA 2020), hosted by Gandhinagar Institute of Technology (GIT), Gujarat, India, on January 24–25, 2020. The proceedings present original and high-quality contributions on theory and practice concerning emerging technologies in the areas of data science and intelligent applications. The conference provides a forum for researchers from academia and industry to present and share their ideas, views and results, while also helping them approach the challenges of technological advancements from different viewpoints.


The contributions cover a broad range of topics, including: collective intelligence, intelligent systems, IoT, fuzzy systems, Bayesian networks, ant colony optimization, data privacy and security, data mining, data warehousing, big data analytics, cloud computing, natural language processing, swarm intelligence, speech processing, machine learning and deep learning, and intelligent applications and systems. Helping strengthen the links between academia and industry, the book offers a valuable resource for instructors, students, industry practitioners, engineers, managers, researchers, and scientists alike.

p.217 Human activity recognition

https://books.google.co.in/books?id=eSbsDwAAQBAJ


© 2020

Data Science and Productivity Analytics

Editors: Charles, Vincent, Aparicio, Juan, Zhu, Joe (Eds.)


Table of contents (15 chapters)

Data Envelopment Analysis and Big Data: Revisit with a Faster Method Pages 1-34

Khezrimotlagh, Dariush (et al.)

Data Envelopment Analysis (DEA): Algorithms, Computations, and Geometry Pages 35-56

Dulá, José H.

An Introduction to Data Science and Its Applications  Pages 57-81

Rabasa, Alex (et al.)

Identification of Congestion in DEA Pages 83-119

Mehdiloo, Mahmood (et al.)

Data Envelopment Analysis and Non-parametric Analysis Pages 121-160

Villa, Gabriel (et al.)

The Measurement of Firms’ Efficiency Using Parametric Techniques Pages 161-199

Orea, Luis

Fair Target Setting for Intermediate Products in Two-Stage Systems with Data Envelopment Analysis

Pages 201-226

An, Qingxian (et al.)

Fixed Cost and Resource Allocation Considering Technology Heterogeneity in Two-Stage Network Production Systems Pages 227-249

Ding, Tao (et al.)

Efficiency Assessment of Schools Operating in Heterogeneous Contexts: A Robust Nonparametric Analysis Using PISA 2015 Pages 251-277

Cordero, Jose Manuel (et al.)

A DEA Analysis in Latin American Ports: Measuring the Performance of Guayaquil Contecon Port 

Pages 279-309

Morales-Núñez, Emilio J. (et al.)

Effects of Locus of Control on Bank’s Policy—A Case Study of a Chinese State-Owned Bank 

Pages 311-335

Xu, Cong (et al.)

A Data Scientific Approach to Measure Hospital Productivity Pages 337-358

Daneshvar Rouyendegh (B. Erdebilli), Babak (et al.)

Environmental Application of Carbon Abatement Allocation by Data Envelopment Analysis Pages 359-389

Yu, Anyu (et al.)

Pension Funds and Mutual Funds Performance Measurement with a New DEA (MV-DEA) Model Allowing for Missing Variables Pages 391-413

Badrizadeh, Maryam (et al.)

Sharpe Portfolio Using a Cross-Efficiency Evaluation Pages 415-439

Landete, Mercedes (et al.)

https://www.springer.com/gp/book/9783030433833



Special Issue on Data Science for Better Productivity

Data science for better productivity

Vincent Charles,Juan Aparicio &Joe Zhu 

Journal of the Operational Research Society 

Volume 72, 2021 - Issue 5: Special Issue Data Science for Better Productivity



Afsharian, M. (2019). A frontier-based facility location problem with a centralised view of measuring the performance of the network. Journal of the Operational Research Society, 72(5), 1058–1074. https://doi.org/10.1080/01605682.2019.1639476   

Bougnol, M.-L., & Dulà, J. (2020). Improving productivity using government data: The case of US Centers for Medicare & Medicaid's ‘Nursing Home Compare. Journal of the Operational Research Society, 72(5), 1075–1086. https://doi.org/10.1080/01605682.2020.1724056   

Del Vecchio, M., Kharlamov, A., Parry, G., & Pogrebna, G. (2020). Improving productivity in Hollywood with data science: Using emotional arcs of movies to drive product and service innovation in entertainment industries. Journal of the Operational Research Society, 72(5), 1110–1137. https://doi.org/10.1080/01605682.2019.1705194   

Grimaldi, D., Fernandez, V., & Carrasco, C. (2019). Exploring data conditions to improve business performance. Journal of the Operational Research Society, 72(5), 1087–1098. https://doi.org/10.1080/01605682.2019.1590136   

Ihrig, S., Ishizaka, A., Brech, C., & Fliedner, T. (2019). A new hybrid method for the fair assignment of productivity targets to indirect corporate processes. Journal of the Operational Research Society, 72(5), 989–1001. https://doi.org/10.1080/01605682.2019.1639477   

Jiang, R., Yang, Y., Chen, Y., & Liang, L. (2019). Corporate diversification, firm productivity and resource allocation decisions: The data envelopment analysis approach. Journal of the Operational Research Society, 72(5), 1002–1014. https://doi.org/10.1080/01605682.2019.1568841   

Li, Y., & Chen, W. (2019). Entropy method of constructing a combined model for improving loan default prediction: A case study in China. Journal of the Operational Research Society, 72(5), 1099–1109. https://doi.org/10.1080/01605682.2019.1702905   

Lin, S.-W., Lu, W.-M., & Lin, F. (2020). Entrusting decisions to the public service pension fund: An integrated predictive model with additive network DEA approach. Journal of the Operational Research Society, 72(5), 1015–1032. https://doi.org/10.1080/01605682.2020.1718011   

Routh, P., Roy, A., & Meyer, J. (2020). Estimating customer churn under competing risks. Journal of the Operational Research Society, 72(5), 1138–1155. https://doi.org/10.1080/01605682.2020.1776166   

Shi, Y., Zhu, J., & Charles, V. (2020). Data science and productivity: A bibliometric review of data science applications and approaches in productivity evaluations. Journal of the Operational Research Society, 72(5), 975–988. https://doi.org/10.1080/01605682.2020.1860661   

Summerfield, N. S., Deokar, A. V., Xu, M., & Zhu, W. (2020). Should drivers cooperate? Performance evaluation of cooperative navigation on simulated road networks using network DEA. Journal of the Operational Research Society, 72(5), 1042–1057. https://doi.org/10.1080/01605682.2019.1700766   

Zhu, J. (2020). DEA under big data: Data enabled analytics and network data envelopment analysis. Annals of Operations Research, 1–23. In press. https://doi.org/10.1007/s10479-020-03668-8 

Zhu, W., Liu, B., Lu, Z., & Yu, Y. (2020). A DEALG methodology for prediction of effective customers of internet financial loan products. Journal of the Operational Research Society, 72(5), 1033–1041. https://doi.org/10.1080/01605682.2019.1700188 [Taylor & Francis On 

https://www.tandfonline.com/doi/full/10.1080/01605682.2021.1892466



Ud. 16.11,2023, 3.45 am Austin, Texas

Pub. 16.7.2021














What is Data Science? - An Introduction to Data Science - New Developments


What is Data Science? - An Introduction to Data Science


Data driven or data analysis driven decision making is age old. But new data processing technology allows people to process data in ways that was not done before. Hence data will drive business decisions much more intensively in the next decade.


IT departments are not content anymore with just providing technology for processing data. The discipline and the profession of  IT is getting  involved in finding and understanding the relevance of new data sources, big and small.

The practice of business intelligence is  expanding to create to develop capabilities for analyzing and visualizing structured and unstructured data for their relevance for business decision making, and then building applications that can be run on a periodic basis which can be as small as even seconds to take crime or fraud prevention activities.

Data science is the name of this emerging discipline.

Data Science Tutorial 1 - Video

__________________________

__________________________
edureka!

More videos are available on YouTube on Data Science




Concise Visual Summary of Deep Learning Architectures
Basically neural network architectures
http://www.datasciencecentral.com/profiles/blogs/concise-visual-summary-of-deep-learning-architectures


http://www.datasciencecentral.com has number of articles on data science.


Data Science - New Developments

2023



50 Years of Data Science
David Donoho
Journal of Computational and Graphical Statistics
Volume 26, 2017 - Issue 4
Pages 745-766  Published online: 19 Dec 2017
https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734

2020
The 2020 Data Science Dictionary—Key Terms You Need to Know
https://www.datasciencecentral.com/profiles/blogs/top-data-science-skills-for-2020-1

Trends in Artificial Intelligence and Data Science for 2020
https://www.datasciencecentral.com/profiles/blogs/trends-in-artificial-intelligence-and-data-science-for-2020-by

Top 5 Data Science Trends for 2020
https://www.datasciencecentral.com/profiles/blogs/top-5-data-science-trends-for-2020



Updated in 2020:  on  14 March 2020

7 June 2017, 2 September 2014